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BACKGROUND OF THE INVENTION 

Feed formulations based on crop plants must typically be supplemented with 
15 specific amino acids to provide animals with essential nutrients which are necessary 
y for their growth. This supplementation is necessary because, in general, crop plants 

jj contain low proportions of several amino acids which are essential for, and cannot 

£3 

si be synthesized by, monogastric animals. 

U 

^ The seeds of crop plants contain different classes of seed proteins. The 

J 20 amino acid composition of these seeds reflects the composition of the prevalent 
fcjg classes of proteins. Amino acid limitations are usually due to amino acid 

Jg deficiencies of these prevalent protein classes. 

Among the amino acids necessary for animal nutrition, those that are of 
limited availability in crop plants include methionine, lysine, and threonine. Attempts 
25 to increase the levels of these amino acids by breeding, mutant selection, and/or 
changing the composition of the storage proteins accumulated in the seeds of crop 
plants, have met with limited success, or were accompanied by a loss in yield. 

For example, although seeds of corn plants containing a mutant transcription 
factor, (opaque 2), or a mutant a-zein gene, (floury 2), exhibit elevated levels of total 
30 and bound lysine, there is an altered seed endosperm structure which is more 
susceptible to damage and pests. Significant yield losses are also typical. 



An alternative means to enhance levels of free amino acids in a crop plant is 
the modification of amino acid biosynthesis in the plant. The introduction of a 
feedback-regulation-insensitive dihydrodipicolinic acid synthase ("DHDPS") gene, 
which encodes an enzyme that catalyzes the first reaction unique to the lysine 
biosynthetic pathway, into plants has resulted in an increase in the levels of free 
lysine in the leaves and seeds of those plants. An increase in the levels of free 
lysine in the embryo results in reduced amount of oil in the seed. Further free lysine 
can be lost during the wet milling process reducing the feed value of the gluten 

product of the process. 

The expression of the lysC gene, which encodes a mutant bacterial aspartate 
kinase that is desensitized to feedback inhibition by lysine and threonine, from a 
seed-specific promoter in tobacco plants, has resulted in an increase in methionine 
and threonine biosynthesis in the seeds of those plants. See Karchi, ef a/.; I_he 
Plant J. ; Vol. 3; p. 721; (1993). However, expression of the lysC gene results in only 
a 6-7% increase in the level of total threonine or methionine in the seed. The 
expression of the /ysC gene in seeds has a minimal impact on the nutritional value of 
those seeds and, thus, supplementation of feed containing lysC transgenic seeds 
with amino acids, such as methionine and threonine, is still required. 

There are additional molecular genetic strategies available for enhancing the 

amino acid quality of plant proteins. Each involves molecular manipulation of plant 

genes and the generation of transgenic plants. 

Protein sequence modification involves the identification of a gene encoding a 

major protein, preferably a storage protein, as the target for modification to contain 

more codons of essential amino acids. An important aspect of this approach is to be 



able to select a region of the protein that can be modified without affecting the 
overall structure, stability, function, and other cellular and nutritional properties of the 
protein. 

The development of DNA synthesis technology allows the design and 
synthesis of a gene encoding a new protein with desirable essential amino acid 
compositions. For example, researchers have synthesized a 292-base pair DNA 
sequence encoding a polypeptide composed of 80% essential amino acids and used 
it with the nopaline synthetase (NOS) promoter to construct a chimeric gene. 
Expression of this gene in the tuber of transgenic potato has resulted in an 
accumulation of this protein at a level of 0.02% to 0.35% of the total plant protein. 
This low level accumulation is possibly due to the weak NOS promoter and/or the 
instability of the new protein. 

Tobacco has been used as a test plant to demonstrate the feasibility of this 
approach by transferring a chimeric gene containing the bean phaseolin promoter 
and the cDNA of a sulfur-rich protein Brazil Nut Protein ("BNP"), (18 mol% 
methionine and 8 mol% cysteine) into tobacco. Amino acid analysis indicates that 
the methionine content in the transgenic seeds is enhanced by 30% over that of the 
untransformed seeds. This same chimeric gene has also been transferred into a 
commercial crop, canola, and similar levels of enhancement were achieved. 

However, an adverse effect is that lysine content decreases. Additionally, 
BNP has been identified as a major food allergen. Thus it is neither practical nor 
desirable to use BNP to enhance the nutritional value of crop plants. 



Thus, there is a need to improve the nutritional value of plant seeds. The 
genetic modification should not be accompanied by detrimental side effects such as 
allergenicity, anti-nutritional quality or poor yield. 

SUMMARY OF THE INVENTION 
It is an object of the present invention to provide a seed, the endosperm of 
which contains elevated levels of an essential amino acid. 

It is a further object of the present invention to provide methods for increasing 
the nutritional value of feed. 

It is a further object of the present invention to provide methods for genetically 
modifying seeds so as to increase amounts of essential amino acids which are 
present in relatively low amounts in unmodified seeds. 

It is a further object of the present invention to provide methods for increasing 
the nutritional content of seeds without detrimental side effects such as allergenicity 
or anti-nutritional quality. 

It is a further object of the present invention to provide methods for increasing 
the nutritional content of seeds while maintaining a high yield. 

It is a further object of the present invention to provide a method for the 
expression of a polypeptide in a seed having levels of a preselected amino acid 
sufficient to reduce or obviate feed supplementation. 

According to the present invention a transformed plant seed is provided, the 
endosperm of which is characterized as having an elevated level of at least one 
preselected amino acid compared to a seed from a corresponding plant which has 
not been transformed, wherein the amino acid is lysine, threonine, or tryptophan and 
optionally a sulfur-containing amino acid. 



Also provided is a seed from a plant which has been transformed to express a 
heterologous protein in the endosperm of the seed, wherein the seed exhibits an 
elevated level of an essential amino acid. 

An expression cassette is also provided comprising a seed endosperm- 
preferred promoter operably linked to a structural gene encoding a polypeptide 
having an elevated level of a preselected amino acid. Transformed plants and seeds 
containing the expression cassette are also provided. 

A method for elevating the level of a preselected amino acid in the 
endosperm of plant seed is also provided. The method comprises the 
transformation of plant cells by introducing the expression cassette, recovering the 
transformed cells, regenerating a transformed plant and collecting the seeds 
therefrom. 

DETAILED DESCRIPTION OF THE INVENTION 

As used herein, a "structural gene " means an exogenous or recombinant 
DNA sequence or segment that encodes a polypeptide. 

As used herein, "recombinant DNA" is a DNA sequence or segment that has 
been isolated from a cell, purified, synthesized or amplified. 

As used herein, "isolated" means either physically isolated from the cell or 
synthesized in vitro on the basis of the sequence of an isolated DNA segment. 

As used herein, the term "increased" or "elevated" levels of the preselected 
amino acid in a protein means that the protein contains an elevated amount of a 
preselected amino acid compared to the amount in an average protein. 



As used herein, "increased" or "elevated" levels or amounts of preselected 
amino acids in a transformed plant or seed are levels which are greater than the 
levels or amounts in the corresponding untransformed plant or seed. 

As used herein, "polypeptide" means proteins, protein fragments, modified 
proteins, amino acid sequences and synthetic amino acid sequences. 

As used herein, "transformed plant" means a plant which comprises a 
structural gene which is introduced into the genome of the plant by transformation. 

As used herein, "untransformed plant" refers to a wild type plant, i.e., one 
where the genome has not been altered by the introduction of the structural gene. 

As used herein, "plant" includes but is not limited to plant cells, plant tissue 
and plant seeds. 

As used herein, "seed endosperm-preferred promoter" is a promoter which 
preferentially promotes expression of the structural gene in the endosperm of the 
seed. 

As used herein with respect to a structural gene encoding a polypeptide, the 
term "expresses" means that the structural gene is incorporated into the genome of 
cells, so that the product encoded by the structural gene is produced within the cells. 

As used herein, the term "essential amino acid" means an amino acid which 
is synthesized only by plants or microorganisms or which is not produced by animals 
in sufficient quantities to support normal growth and development. 

As used herein, the term "high lysine content protein" means that the protein 
has at least about 7 mole % lysine, preferably about 7 mole % to about 50 mole % 
lysine, more preferably about 7 mole % to about 40 mole % lysine and most 
preferably about 7 mole % to about 30 mole %. 



As used herein, the term "high sulfur content protein" means that the protein 
contains at least about 6 mole % methionine and/or cysteine, preferably about 6 
mole % to about 40 mole %, more preferably about 6 mole % to about 30 mole % 
and most preferably 6 mole % to 25 mole %. 

The present invention provides a transformed plant seed, the endosperm of 
which is characterized as having an elevated level of a preselected amino acid 
compared to the seed of a corresponding plant which has not been transformed. It 
is preferred that the level of preselected amino acid is elevated in the endosperm in 
preference to other parts of the seed. 

The preselected amino acid is an essential amino acid such as lysine, 
cysteine, methionine, threonine, tryptophan, arginine, valine, leucine, isoleucine, 
histidine or combinations thereof, preferably, the preselected amino acid is lysine, 
threonine, cysteine, tryptophan, or combinations thereof and optionally methionine. 
It is especially preferred that the polypeptide has an increased content of lysine as 
well as a sulfur containing amino acid, i.e., methionine and/or cysteine. 

The polypeptide can be an endogenous or heterologous protein. When an 
endogenous protein is expressed, the preselected amino acid is lysine, cysteine, 
threonine, tryptophan, arginine, valine, leucine, isoleucine, histidine or combinations 
thereof and optionally methionine. When the protein is a heterologous protein, any 
of the above described preselected amino acids or combinations thereof is present 
in elevated amounts. 

Generally the amount of preselected amino acid in the seed of the present 
invention is at least about 10 percent by weight greater than in a corresponding 
untransformed seed, preferably about 10 percent by weight to about 10 times 



greater, more preferably about 15 percent by weight to about 10 time greater and 
most preferably about 20 percent to about 10 times greater. 

A polypeptide having an elevated amount of the preselected amino acid is 
expressed in the transformed plant seed endosperm in an amount sufficient to 
increase the amount of at least one preselected amino acid in the seed of the 
transformed plant, relative to the amount of the preselected amino acid in the seed 
of a corresponding untransformed plant. 

The choice of the structural gene is based on the desired amino acid 
composition of the polypeptide encoded by the structural gene, and the ability of the 
polypeptide to accumulate in seeds. The amino acid composition of the polypeptide 
can be manipulated by methods, such as site-directed mutagenesis of the structural 
gene encoding the polypeptide, so as to result in expression of a polypeptide that is 
increased in the amount of a particular amino acid. For example, site-directed 
mutagenesis can be used to increase levels of lysine, methionine, cysteine, 
threonine and/or tryptophan and/or to decrease levels of asparagine and/or 
glutamine. 

The derivatives differ from the wild-type protein by one or more amino acid 
substitutions, insertions, deletions or the like. Typically, amino acid substitutions are 
conservative. In the regions of homology to the native sequence, variants preferably 
have at least 90% amino acid sequence identity, more preferably at least 95% 
identity. 

Typical examples of suitable proteins include barley chymotrypsin inhibitor, 
barley alpha hordothionin, soybean 2S albumin proteins, rice high methionine 
protein and sunflower high methionine protein and derivatives of each protein. 



Barley alpha hordothionin has been modified to increase the level of particular 
amino acids. The sequences of genes which express modified alpha hordothionin 
proteins with enhanced essential amino acids are based on the mRNA sequence of 
the native Hordeum vulgare alpha hordothionin gene (accession number X05901, 
Ponz etal. 1986 Eur. J. Biochem. 156:131-135). 

Modified hordothionin proteins are described in U.S. Ser. Nos. 08/838,763 
filed April 10, 1997; 08/824,379 filed March 26, 1997; 08/824,382 filed March 26, 
1997; and U.S. Pat. No. 5,703,409 issued December 30, 1997 the disclosures of 
which are incorporated herein in their entirety by reference. 

Alpha hordothionin is a 45-amino acid protein which is stabilized by four 
disulfide bonds resulting from eight cysteine residues. In its native form, the protein 
is especially rich in arginine and lysine residues, containing 5 residues (10%) of 
each. However, it is devoid of the essential amino acid methionine. 

Alpha hordothionin has been modified to increase the amount of various 
amino acids such as lysine, threonine or methionine. The protein has been 
synthesized and the three-dimensional structure determined by computer modeling. 
The modeling of the protein predicts that the ten charged residues (arginine at 
positions 5, 10, 17, 19 and 30, and lysine at positions 1, 23, 32, 38 and 45) all occur 
on the surface of the molecule. The side chains of the polar amino acids 
(asparagine at position 11, glutamine at position 22 and threonine at position 41) 
also occur on the surface of the molecule. Furthermore, the hydrophobic amino 
acids, (such as the side chains of leucine at positions 8, 15, 24 and 33 and valine at 
position 18) are also solvent- accessible. 



The Three-dimensional modeling of the protein indicates that the arginine 
residue at position 10 is important to retention of the appropriate 3-dimensional 
structure and possible folding through hydrogen bond interactions with the C- 
terminal residue of the protein. A lysine, methionine or threonine substitution at that 
point would disrupt this hydrogen bonding network, leading to a destabilization of the 
structure. The synthetic peptide having this substitution could not be made to fold 
correctly, which supported this analysis. Conservation of the arginine residue at 
position 1 0 provides a protein which folds correctly. 

Alpha hordothionin has been modified to contain 12 lysine residues in the 
mature hordothionin peptide, referred to as HT12. (Rao et al. 1994 Protein 
Engineering 7(1 2): 1485-1 493 and WO 94/16078 published July 21, 1994) The 
disclosure of each of these is incorporated herein by reference in their entirety. 

Further analysis of substitutions which would not alter the 3-dimensional 
structure of the molecule led to replacement of Asparagine-1 1, Glutamine-22 and 
Threonine-41 with lysine residues with virtually no steric hindrance. The resulting 
compound contains 27% lysine residues. 

Other combinations of these substitutions were also made, including changes in 
amino acid residues at one or more of positions 5, 11, 17, 19, 22, 30 and 41 are 
lysine, and the remainder of the residues at those positions are the residues at the 
corresponding positions in the wild type hordothionin. 

Since threonine is a polar amino acid, the surface polar amino acid residues, 
asparagine at position 11 and glutamine at position 22, can be substituted; and the 
charged amino acids, lysine at positions 1 , 23, 32 and 38 and arginine at positions 5, 
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17, 19, and 30, can also be substituted with threonine. The molecule can be 
synthesized by solid phase peptide synthesis. 

While the above sequence is illustrative of the present invention, it is not 
intended to be a limitation. Threonine substitutions can also be performed at 
positions containing charged amino acids. Only arginine at position 10 and lysine at 
position 45 are important for maintaining the structure of the protein. One can also 
substitute at the sites having hydrophobic amino acids. These include positions 8, 
15, 18 and 24. 

Since methionine is a hydrophobic amino acid, the surface hydrophobic 
amino acid residues, leucine at positions 8, 15, and 33, and valine at position 18, 
were substituted with methionine. The surface polar amino acids, asparagine at 
position 11, glutamine at position 22 and threonine at position 41, are substituted 
with methionine. The molecule is synthesized by solid phase peptide synthesis and 
folds into a stable structure. It has seven methionine residues (15.5%) and, 
including the eight cysteines, the modified protein has a sulfur amino acid content of 
33%. 

While the above-described proteins are illustrative of suitable polypeptides 
which can be expressed in the transformed plant, it is not intended to be a limitation. 
Methionine substitutions can also be performed at positions containing charged 
amino acids. Only arginine at position 10 is important for maintaining the structure 
of the protein through a hydrogen-bonding network with serine at position 2 and 
lysine at position 45. Thus, one can substitute methionine for lysine at positions 1, 
23, 32, and/or 38, and for arginine at positions 5, 17, 19 and/or 30. 



Many other proteins are also appropriate, for example the protein encoded by 
the structural gene can be a lysine and/or sulfur rich seed protein, such as the 
soybean 2S albumin described in U.S. Ser. No. 08/618,911 filed March 20, 1996, 
and the chymotrypsin inhibitor from barley, Williamson et a/., Eur. J Biochem 165: 
5 99-1 06 (1 987), the disclosures of each are incorporated by reference. 

Derivatives of these genes can be made by site directed mutagenesis to 
increase the level of preselected amino acids in the encoded polypeptide. For 
example the gene encoding for the barley high lysine polypeptide (BHL), is derived 
from barley chymotrypsin inhibitor, U.S. Ser. No. 08/740,682 filed November 1, 1996 
J 10 and PCT/US97/20441 filed October 31, 1997, the disclosures of each are 
5 incorporated herein by reference. The gene encoding for the enhanced soybean 

2 albumin gene (ESA) , is derived from soybean 2S albumin described in U.S. Ser. 

81 

~* No. 08/618,911, the disclosure of which is incorporated herein in its entirety by 

IPS 

pi reference. 

15 Other examples of sulfur-rich plant proteins within the scope of the invention 

m include plant proteins enriched in cysteine but not methionine, such as the wheat 

endosperm purothionine (Mak and Jones; Can. J. Biochem. ; Vol. 22; p. 83J; (1976); 
incorporated herein in its entirety by reference), the pea low molecular weight 
albumins (Higgins, et ai\ J. Biol. Chem. ; Vol. 261; p. 11124; (1986); incorporated 
20 herein in its entirety by reference) as well as 2S albumin genes from other 
organisms. See, for example, Coulter, et a/.; J. Exp. Bot ; Vol. 41; p. 1541; (1990); 
incorporated herein in its entirety by reference. 
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Such proteins also include methionine-rich plant proteins such as from 
sunflower seed (Lilley, et a/.; In: Proceedings of the Wor ld Congress on Vegetable 
Protein Utilization in Human Foods and Animal Feedstuffs ; Applewhite, H. (ed.); 
American Oil Chemists Soc; Champaign, IL; pp. 497-502; (1989); incorporated 
herein in its entirety by reference), corn (Pedersen, et a/.; J. Biol. Chem. p. 261; p. 
6279; (1986); Kirihara, ef a/.; Gene . Vol. 71; p. 359; (1988); both incorporated herein 
in its entirety by reference), and rice (Musumura, ef a/.; Plant Mol. Biol.: Vol. 12; p. 
123; (1989); incorporated herein in its entirety by reference). 

The present invention also provides a method for genetically modifying plants 
to increase the level of at least one preselected amino acid in the endosperm of the 
seed so as to enhance the nutritional value of the seeds. 

The method comprises the introduction of an expression cassette into 
regenerable plant cells to yield transformed plant cells. The expression cassette 
comprises a seed endosperm-preferred promoter operably linked to a structural 
gene encoding a polypeptide elevated in content of the preselected amino acid. 

A fertile transformed plant is regenerated from the transformed cells, and 
seeds are isolated from the plant. The structural gene is transmitted through a 
complete normal sexual cycle of the transformed plant to the next generation. 

The polypeptide is synthesized in the endosperm of seed of the plant which 
has been transformed by insertion of the expression cassette described above. 
The sequence for the nucleotide molecule, either RNA or DNA, can readily be 
derived from the amino acid sequence for the selected polypeptide using standard 
reference texts. 
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Plants which can be used in the method of the invention include 
monocotyledonous cereal plants. Preferred plants include maize, wheat, rice, 
barley, oats, sorghum, millet and rye. The most preferred plant is maize. 

Seeds derived from plants regenerated from transformed plant cells, plant 
5 parts or plant tissues, or progeny derived from the regenerated transformed plants, 
may be used directly as feed or food, or further processing may occur. 

Transformation 

The transformation of plants in accordance with the invention may be carried 
out in essentially any of the various ways known to those skilled in the art of plant 

2 10 molecular biology. These include, but are not limited to, microprojectile 

O 

ft bombardment, microinjection, electroporation of protoplasts or cells comprising 

partial cell walls, and >Agrofeacfera/m-mediated DNA transfer. 
!" 1. DNA Used for Transformation 

jjj DNA useful for introduction into plant cells includes DNA that has been 

W- 

^ 15 derived or isolated from any source, that may be subsequently characterized as to 
m structure, size and/or function, chemically altered, and later introduced into the plant. 

An example of DNA "derived" from a source, would be a DNA sequence or 
segment that is identified as a useful fragment within a given organism, and which is 
then synthesized in essentially pure form. An example of such DNA "isolated" from 
20 a source would be a useful DNA sequence that is excised or removed from the 
source by chemical means, e.g., by the use of restriction endonucleases, so that it 
can be further manipulated, e.g., amplified, for use in the invention, by the 
methodology of genetic engineering. 
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Therefore, useful DNA includes completely synthetic DNA, semi-synthetic 
DNA, DNA isolated from biological sources, and DNA derived from RNA. The DNA 
isolated from biological sources, or DNA derived from RNA, includes, but is not 
limited to, DNA or RNA from plant genes, and non-plant genes such as those from 
bacteria, yeasts, animals or viruses. The DNA or RNA can include modified genes, 
portions of genes, or chimeric genes, including genes from the same or different 
genotype. 

The term "chimeric gene" or "chimeric DNA" is defined as a gene or DNA 
sequence or segment comprising at least two DNA sequences or segments from 
species which do not recombine DNA under natural conditions, or which DNA 
sequences or segments are positioned or linked in a manner which does not 
normally occur in the native genome of untransformed plant. 

A structural gene of the invention can be identified by standard methods, e.g., 
enrichment protocols, or probes, directed to the isolation of particular nucleotide or 
amino acid sequences. The structural gene can be identified by obtaining and/or 
screening of a DNA or cDNA library generated from nucleic acid derived from a 
particular cell type, cell line, primary cells, or tissue. 

Screening for DNA fragments that encode all or a portion of the structural 
gene can be accomplished by screening plaques from a genomic or cDNA library for 
hybridization to a probe of the structural gene from other organisms or by screening 
plaques from a cDNA expression library for binding to antibodies that specifically 
recognize the polypeptide encoded by the structural gene. 
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DNA fragments that hybridize to a structural gene probe from other organisms 
and/or plaques carrying DNA fragments that are immunoreactive with antibodies to 
the polypeptide encoded by the structural gene can be subcloned into a vector and 
sequenced and/or used as probes to identify other cDNA or genomic sequences 
encoding all or a portion of the structural gene. 

Portions of the genomic copy or copies of the structural gene can be partially 
sequenced and identified by standard methods including either DNA sequence 
homology to other homologous genes or by comparison of encoded amino acid 
sequences to known polypeptide sequences. 

Once portions of the structural gene are identified, complete copies of the 
structural gene can be obtained by standard methods, including cloning or 
polymerase chain reaction (PCR) synthesis using oligonucleotide primers 
complementary to the structural gene. The presence of an isolated full-length copy 
of the structural gene can be verified by comparison of its deduced amino acid 
sequence with the amino acid sequence of native polypeptide sequences. 

As discussed above, the structural gene encoding the polypeptide can be 
modified to increase the content of particular amino acid residues in that polypeptide 
by methods well known to the art, including, but not limited to, site-directed 
mutagenesis. Thus, derivatives of naturally occurring polypeptides can be made by 
nucleotide substitution of the structural gene so as to result in a polypeptide having a 
different amino acid at the position in the polypeptide which corresponds to the 
codon with the nucleotide substitution. The introduction of multiple amino acid 
changes in a polypeptide can result in a polypeptide which is significantly enriched in 
a preselected amino acid. 



As noted above, the choice of the polypeptide encoded by the structural gene 
will be based on the amino acid composition of the polypeptide and its ability to 
accumulate in seeds. The amino acid can be chosen for its nutritional value to 
produce a value-added trait to the plant or plant part. Amino acids desirable for 
value-added traits, as well as a source to limit synthesis of an endogenous protein 
include, but are not limited to, lysine, threonine, tryptophan, methionine, and 
cysteine. 

Ex pression Cassettes and Expression Vectors 

According to the present invention, a structural gene is identified, isolated, 
and combined with a seed endosperm-preferred promoter to provide a recombinant 
expression cassette. 

The construction of such expression cassettes which can be employed in 
conjunction with the present invention are well known to those of skill in the art in 
light of the present disclosure. See, e.g., Sambrook, et a/.; Molecular Cloning: A 
Laboratory Manual : Cold Spring Harbor, New York; (1989); Gelvin, et a/.; Plant 
Molecular Bioloav Manual ; (1990); Plant Biotechnology: Co mmercial Prospects and 
Problems , eds Prakash, et a/.; Oxford & IBH Publishing Co.; New Delhi, India; 
(1993); and Heslot, et a/.; Molecular Bioloav and Genetic E ngineering of Yeasts; 
CRC Press, Inc., USA; (1992); each incorporated herein in its entirety by reference. 

Preferred promoters useful in the practice of the invention are those seed 
endosperm-preferred promoters that allow expression of the structural gene 
selectively in seed endosperm to avoid any potential deleterious effects associated 
with the expression of the structural gene in the embryo. 
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It has been found that when endosperm-preferred promoters are employed, 
the total level of the preselected amino acid in the seed is increased compared to a 
seed produced by employing an embryo-preferred promoter, such as the globulinl 
promoter. When the globulinl promoter is employed, the polypeptide is expressed 
by the structural gene, but the total amount of the preselected amino acid is not 
increased. 

Examples of suitable promoters include, but are not limited to, 27 kD gamma 
zein promoter and waxy promoter. See the following sites relating to the 27kD 
gamma zein promoter: BoronatA, Martinez,M.C, Reina.M., Puigdomenech.P. and 
Palau.J.; Isolation and sequencing of a 28 kD glutelin-2 gene from maize: Common 
elements in the 5' flanking regions among zein and glutelin genes; Plant Sci. 47, 95- 
102 (1986) and Reina.M., Ponte,!., Guillen,P., Boronat.A. and Palau.J., Sequence 
analysis of a genomic clone encoding a Zc2 protein from Zea mays W64 A, Nucleic 
Acids Res. 18 (21), 6426 (1990). See the following site relating to the waxy 
promoter: Kloesgen.R.B., Gierl.A., Schwarz-Sommer.ZS. and Saedler.H., Molecular 
analysis of the waxy locus of Zea mays, Mol. Gen. Genet. 203, 237-244 (1986). The 
disclosures each of these are incorporated herein by reference in their entirety. 

However, other endosperm-preferred promoters can be employed. 
II. DELIVERY OF DNA TO CELLS 

The expression cassette or vector can be introduced into prokaryotic or 
eukaryotic cells by currently available methods which are described in the literature. 
See for example, Weising et a/., Ann. Rev. Genet. 2: 421-477 (1988). For example, 
the expression cassette or vector can be introduced into plant cells by methods 
including, but not limited to, Agrobacterium-med'\ated transformation, electroporation, 
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PEG poration, microprojectile bombardment, microinjection of plant cell protoplasts 
or embryogenic callus, silicon fiber delivery, infectious viruses or viroids such as 
retroviruses, the use of liposomes and the like, all in accordance with well-known 
procedures. 

The introduction of DNA constructs using polyethylene glycol precipitation is 
described in Paszkowski et al., Embo J. 3: 2717-2722 (1984). Electroporation 
techniques are described in Fromm et al., Proc. Natl. Acad. Sci. 82: 5324 (1985). 
Ballistic transformation techniques are described in Klein ef al., Nature 327: 70-73 
(1987). The disclosure of each of these is incorporated herein in its entirety by 
reference. 

Introduction and expression of foreign genes in plants has been shown to be 
possible using the T-DNA of the tumor-inducing (Ti) plasmid of Agrobacterium 
tumefaciens. Using recombinant DNA techniques and bacterial genetics, a wide 
variety of foreign DNAs can be inserted into T-DNA in Agrobacterium. Following 
infection by the bacterium containing the recombinant Ti plasmid, the foreign DNA is 
inserted into the host plant chromosomes, thus producing a genetically engineered 
cell and eventually a genetically engineered plant. A second approach is to 
introduce root-inducing (Ri) plasmids as the gene vectors. 

Agrobacterium tumefaciens-med\ated transformation techniques are well 
described in the literature. See, for example Horsch et al., Science 233: 496-498 
(1984), and Fraley et al., Proc. Natl. Acad. Sci . 80: 4803 (1983). Agrobacterium 
transformation of maize is described in U.S. Patent No. 5,550,318. The disclosure 
of each of these is incorporated herein in its entirety by reference. 
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Other methods of transfection or transformation include (1) Agrobacterium 
rhizogenes-medmted transformation (see, e.g., Lichtenstein and Fuller In: Genetic 
Engineering , vol. 6, PWJ Rigby, Ed., London, Academic Press, 1987; and 
Lichtenstein, C. P., and Draper, J,. In: DNA Cloning . Vol. II, D. M. Glover, Ed., 
Oxford, IRI Press, 1985). Application PCT/US87/02512 (WO 88/02405 published 
Apr. 7, 1988) describes the use of A. rhizogenes strain A4 and its Ri plasmid along 
with A. tumefaciens vectors pARC8 or pARC16 (2) liposome-mediated DNA uptake 
(see, e.g., Freeman et al., Plant Cell Phvsiol. 25: 1353, 1984), (3) the vortexing 
method (see, e.g., Kindle, Prna. Natl. Acad. ScL USA 87: 1228, (1990). The 
disclosure of each of these is incorporated herein in its entirety by reference. 

DNA can also be introduced into plants by direct DNA transfer into pollen as 
described by Zhou et al., Methods in Enzvmoloov . 101:433 (1983); D. Hess, Intern 
Rev. CvtoL 107:367 (1987); Luo et al., Plane Mol. Biol. Reporter . 6:165 (1988). The 
disclosure of each of these is incorporated herein in its entirety by reference. 

Expression of polypeptide coding genes can be obtained by injection of the 
DNA into reproductive organs of a plant as described by Pena et al., Nature, 
325.:274 (1987). The disclosure of which is incorporated herein in its entirety by 
reference. 

DNA can also be injected directly into the cells of immature embryos and the 
rehydration of desiccated embryos as described by Neuhaus et al., Theor. Appl. 
Genet .. 75:30 (1987); and Benbrook et al., in Proceedings Bio Expo 1986, 
Butterworth, Stoneham, Mass., pp. 27-54 (1986). The disclosure of each of these is 
incorporated herein in its entirety by reference. 
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Plant cells useful for transformation include cells cultured in suspension 
cultures, callus, embryos, meristem tissue, pollen, and the like. 

A variety of plant viruses that can be employed as vectors are known in the 
art and include cauliflower mosaic virus (CaMV), geminivirus, brome mosaic virus, 
and tobacco mosaic virus. 

Typical vectors useful for expression of genes in higher plants are well known 
in the art and include vectors derived from the tumor-inducing (Ti) plasmid of 
Agrobacterium tumefaciens described by Rogers etal., Meth. In Enzymol., 153:253- 
277 (1987). These vectors are plant integrating vectors in that on transformation, the 
vectors integrate a portion of vector DNA into the genome of the host plant. The 
disclosure of which is incorporated herein in its entirety by reference. 

A particularly preferred vector is a plasmid, by which is meant a circular 
double-stranded DNA molecule which is not a part of the chromosomes of the cell. 
Exemplary A. tumefaciens vectors useful herein are plasmids pKYLX6 and pKYLX7 
of Schardl et ai, Gene , 61:1-11 (1987) and Berger et a/., Proc. Natl. Acad. Sci. 
U.S.A., 86:8402-8406 (1989). Another useful vector herein is plasmid pBH01.2 that 
is available from Clontech Laboratories, Inc. (Palo Alto, CA). The disclosure of each 
of these is incorporated herein in its entirety by reference. 

A cell in which the foreign genetic material in a vector is functionally 
expressed has been "transformed" by the vector and is referred to as a 
"transformant". 

Either genomic DNA or cDNA coding the gene of interest may be used in this 
invention. The gene of interest may also be constructed partially from a cDNA clone 
and partially from a genomic clone. 
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When the gene of interest has been isolated, genetic constructs are made 
which contain the necessary regulatory sequences to provide for efficient expression 
of the gene in the host cell. 

According to this invention, the genetic construct will contain (a) a genetic 
sequence coding for the polypeptide of interest and (b) one or more regulatory 
sequences operably linked on either side of the structural gene of interest. 
Typically, the regulatory sequences will be a promoter or a terminator. The 
regulatory sequences may be from autologous or heterologous sources. 

The cloning vector will typically carry a replication origin, as well as specific 
genes that are capable of providing phenotypic selection markers in transformed 
host cells. Typically, genes conferring resistance to antibiotics or selected 
herbicides are used. After the genetic material is introduced into the target cells, 
successfully transformed cells and/or colonies of cells can be isolated by selection 
on the basis of these markers. 

Typical selectable markers include genes coding for resistance to the 
antibiotic spectinomycin (e.g., the aada gene), the streptomycin phosphotransferase 
(SPT) gene coding for streptomycin resistance, the neomycin phosphotransferase 
(NPTII) gene encoding kanamycin or geneticin resistance, the hygromycin 
phosphotransferase (HPT) gene coding for hygromycin resistance. 

Genes coding for resistance to herbicides include genes which act to inhibit 
the action of acetolactate synthase (ALS), in particular the sulfonylurea-type 
herbicides (e.g., the acetolactate synthase (ALS) genes containing mutations 
leading to such resistance in particular the S4 and/or Hra mutations), genes coding 
for resistance to herbicides which act to inhibit action of glutamine synthase, such as 
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phosphinothricin or basta (e.g., the pat or bar gene), or other such genes known in 
the art. The bar gene encodes resistance to the herbicide basta, and the ALS gene 
encodes resistance to the herbicide chlorsulfuron. 

Typically, an intermediate host cell will be used in the practice of this 
invention to increase the copy number of the cloning vector. With an increased copy 
number, the vector containing the gene of interest can be isolated in significant 
quantities for introduction into the desired plant cells. 

Host cells that can be used in the practice of this invention include 
prokaryotes, including bacterial hosts such as E. coli, S. typhimurium, and Serratia 
marcescens. Eukaryotic hosts such as yeast or filamentous fungi may also be used 
in this invention. Since these hosts are also microorganisms, it will be essential to 
ensure that plant promoters which do not cause expression of the polypeptide in 
bacteria are used in the vector. 

The isolated cloning vector will then be introduced into the plant cell using any 
convenient transformation technique as described above. 
III. Regeneration and Analysis of Transformants 

Following transformation, regeneration is involved to obtain a whole plant 
from transformed cells and the presence of structural gene (s) or 'transgene(s)" in 
the regenerated plant is detected by assays. The seed derived from the plant is 
then tested for levels of preselected amino acids. Depending on the type of plant 
and the level of gene expression, introduction of the structural gene into the plant 
seed endosperm can enhance the level of preselected amino acids in an amount 
useful to supplement the nutritional quality of those seeds. 
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Using known techniques, protoplasts and cell or tissue culture can be 
regenerated to form whole fertile plants which carry and express the gene for a 
polypeptide according to this invention. 

Accordingly, a highly preferred embodiment of the present invention is a 
transformed maize plant, the cells of which contain at least one copy of the DNA 
sequence of an expression cassette containing a gene encoding a polypeptide 
containing elevated amounts of an essential amino acid, such an HT12, BHL or ESA 
protein. 

Techniques for regenerating plants from tissue culture, such as transformed 
protoplasts or callus cell lines, are known in the art. For example, see Phillips, et al.; 
Plant Cell Tissue Organ Culture ; Vol. 1; p. 123; (1981); Patterson, et al.; Plant Sci.; 
Vol. 42; p. 125; (1985); Wright, et al.; Plant Cell Reports ; Vol. 6; p. 83; (1987); and 
Barwale, et al.; Planta ; Vol. 167; p. 473; (1986); each incorporated herein in its 
entirety by reference. The selection of an appropriate method is within the skill of 
the art. 

Examples of the practice of the present invention detailed herein relate 
specifically to maize plants. However, the present invention is also applicable to 
other cereal plants. The expression vectors utilized herein are demonstrably 
capable of operation in cells of cereal plants both in tissue culture and in whole 
plants. The invention disclosed herein is thus operable in monocotyledonous 
species to transform individual plant cells and to achieve full, intact plants which can 
be regenerated from transformed plant cells and which express preselected 
polypeptides. 



-24- 



The introduced structural genes are expressed in the transformed plant cells 
and stably transmitted (somatically and sexually) to the next generation of cells 
produced. The vector should be capable of introducing, maintaining, and expressing 
a structural gene in plant cells. The structural gene is passed on to progeny by 
normal sexual transmission. 

To confirm the presence of the structural gene (s) or "transgene(s)" in the 
regenerating plants, or seeds or progeny derived from the regenerated plant, a 
variety of assays can be performed. Such assays include Southern and Northern 
blotting; PCR; assays that detect the presence of a polypeptide product, e.g., by 
immunological means (ELISAs and Western blots) or by enzymatic function; plant 
part assays, such as leaf, seed or root assays; and also, by analyzing the phenotype 
of the whole regenerated plant. 

Whereas DNA analysis techniques can be conducted using DNA isolated 
from any part of a plant, RNA will be expressed in the seed endosperm and hence it 
will be necessary to prepare RNA for analysis from these tissues. 

PCR techniques can be used for detection and quantitation of RNA produced 
from introduced structural genes. In this application of PCR it is first necessary to 
reverse transcribe RNA into DNA, using enzymes such as reverse transcriptase, and 
then through the use of conventional PCR techniques amplify the DNA. In most 
instances PCR techniques, while useful, will not demonstrate integrity of the RNA 
product. 

Further information about the nature of the RNA product may be obtained by 
Northern blotting. This technique will demonstrate the presence of an RNA species 
and give information about the integrity of that RNA. The presence or absence of an 
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RNA species can also be determined using dot or slot blot Northern hybridizations. 

These techniques are modifications of Northern blotting and will only demonstrate 

the presence or absence of an RNA species. 

While Southern blotting and PCR may be used to detect the structural gene in 

question, they do not provide information as to whether the structural gene is being 

expressed. Expression may be evaluated by specifically identifying the polypeptide 

products of the introduced structural genes or evaluating the phenotypic changes 

brought about by their expression. 

Assays for the production and identification of specific polypeptides may 
make use of physical-chemical, structural, functional, or other properties of the 
polypeptides. Unique physical-chemical or structural properties allow the 
polypeptides to be separated and identified by electrophoretic procedures, such as 
native or denaturing gel electrophoresis or isoelectric focusing, or by 
chromatographic techniques such as ion exchange or gel exclusion 
chromatography. 

The unique structures of individual polypeptides offer opportunities for use of 
specific antibodies to detect their presence in formats such as an ELISA assay. 
Combinations of approaches may be employed with even greater specificity such as 
Western blotting in which antibodies are used to locate individual gene products that 
have been separated by electrophoretic techniques. 

Additional techniques may be employed to absolutely confirm the identity of 
the product of interest such as evaluation by amino acid sequencing following 
purification. Although these are among the most commonly employed, other 
procedures may be additionally used. 
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Very frequently, the expression of a gene product is determined by evaluating 
the phenotypic results of its expression. These assays also may take many forms, 
including but not limited to, analyzing changes in the chemical composition, 
morphology, or physiological properties of the plant. In particular, the elevated 
preselected amino acid content due to the expression of structural genes encoding 
polypeptides can be detected by amino acid analysis. 

Breeding techniques useful in the present invention are well known in the art. 

The present invention will be further described by reference to the following 
detailed examples. It is understood, however, that there are many extensions, 
variations, and modifications on the basic theme of the present invention beyond 
that shown in the examples and description, which are within the spirit and scope of 
the present invention. 

Examples 

EXAMPLE 1 

Construction of the HT12 gene and of other genes encoding polypeptides 
having an elevated level of a preselected amino acid. 

As noted above, the sequence of the HT12 gene is based on the mRNA 
sequence of the native Hordeum vulgare alpha hordothionin gene (accession 
number X05901, Ponz et al. 1986 Eur. J. Biochem. 156:131-135) modified to 
introduce 12 lysine residues into the mature hordothionin peptide (See Rao et al. 
1994 Protein Engineering 7(12):1485-1493, and WO 94/16078 published July 21, 
1994). 

The alpha hordothionin cDNA comprising the entire alpha hordothionin coding 
sequence is isolated by rt-PCR of mRNA from developing barley seed. Primers are 
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designed based upon the published alpha hordothionin sequence to amplify the 
gene and to introduce a Ncol site at the start (ATG) codon and a BamHI site after 
the stop codon of the thionin coding sequence to facilitate cloning. 

Primers are designated as HTPCR1 (5- 

AGTATAAGTAAACACACCATCACACCCTTGAGGCCCTTGCTGGTGGCCATGGT 

G-3') and HTPCR2 (5- 

CCTCACATCCCTTAGTGCCTAAGTTCGACGTCGGGCCCTCTAGTCGACGGATC 

CA-3'). These primers are used in a PCR reaction to amplify alpha hordothionin by 
conventional methods. The resulting PCR product is purified and subcloned into the 
BamHI/Ncol digested pBSKP vector (Stratagene, LaJolla, CA) and sequenced on 
both strands to confirm its identity. The clone is designated pBSKP-HT (seq. ID 1). 
Primers are designed for single stranded DNA site-directed mutagenesis to 
introduce 12 codons for lysine, based on the peptide structure of hordothionin 12 
(Ref: Rao et al. 1994 Protein Engineering 7(1 2): 1485-1 493) and are designated 
HT1 2mut1 (5'-AGCGGAAAATGCCCGAAAGGCTTCCCCAAATTGGC-3'), 

HT12mut2 ( 5 '" 
TGCGCAGGCGTCTGCAAGTGTAAGCTGACTAGTAGCGGAAAATGC-3'), 

HT12mut3 ( 5 '" 
TACAACCTTTGCAAAGTCAAAGGCGCCAAGAAGCTTTGCGCAGGCGTCTG-3'), 

HT12mut4 ( 5 ' _ 
GCAAG AGTTGCTGC AAG AGTACCCTG G G AAGGAAGTG CTACAACCTTTG C-3') . 

Sequence analysis is used to verify the desired sequence of the resulting 
plasmid, designated pBSKP-HT12 (seq. ID 2). 
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Similarly, genes encoding other derivatives of hordothionine, as described 
above, (See U.S. Ser. Nos. 08/838,763 filed April 10, 1997; 08/824,379 filed March 
26, 1997; 08/824,382 filed March 26, 1997; and U.S. Pat. No. 5,703,409 issued 
December 30, 1997), the gene encoding enhanced soybean albumin (ESA) (See 
U.S. Ser. No. 08/618,911), and genes encoding BHL and other derivatives of the 
barley chymotrypsin inhibitor (See U.S. Ser. No. 08/740,682 filed November 1, 1996 
and PCT/US97/20441 filed October 31, 1997) are constructed by site directed 
mutagenesis from pBSKP-HT, a subclone of the soybean 2S albumin 3 gene in the 
pBSKP vector (Stratagene, LaJolla, CA), and a subclone of the barley chymotrypsin 
inhibitor in the pBSKP vector, respectively. 
EXAMPLE 2 

Construction of vectors for seed preferred expression of polypeptides having 
an elevated level of a preselected amino acid. 

A 442bp DNA fragment containing the modified hordothionin gene encoding 
HT12 is isolated from plasmid pBSKP-HT12 by Ncol/BamHI restriction digestion, gel 
purification and is ligated between the 27 kD gamma zein promoter and 27kD 
gamma zein terminator of the Ncol/BamHI digested vector PHP3630. PHP 3630 is 
a subclone of the endosperm-preferred 27kD gamma zein gene (Genbank 
accession number X58197) in the pBSKP vector (Stratagene), which is modified by 
site directed mutagenesis by insertion of a Ncol site at the start codon (ATG) of the 
27kD gamma zein coding sequence. The 27kD gamma zein coding sequence is 
replaced with the HT12 coding sequence. The resulting expression vector 
containing the chimeric gene construct gz::HT12::gz, designated as PHP8001 (Seq. 
ID 3), is verified by extensive restriction digest analysis and DNA sequencing. 
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Similarly, the 442bp DNA fragment containing the HT12 coding sequence is 
inserted between the globulinl promoter and the globulinl terminator of the embryo 
preferred corn globulinl gene (Genbank accession number X59083), and between 
the waxy promoter and the waxy terminator of the endosperm-preferred waxy gene 
(Genbank accession number M24258). The globulinl and waxy coding sequences, 
respectively, are replaced with the HT12 coding sequence. The resulting chimeric 
genes glb1::HT12::glb1, and wx::HT12::wx are designated as PHP 7999 (Seq. ID 4), 
and PHP 5025 (Seq. ID 5). 

In a like manner, expression vectors containing genes encoding other 
derivatives of hordothionine (See Rao et al. 1994 Protein Engineering 7(1 2): 1485- 
1493, and WO 94/16078 published July 21, 1994), the gene encoding enhanced 
soybean albumin (ESA) (See U.S. Ser. No. 08/618,911,), and genes encoding BHL 
and other derivatives of the barley chymotrypsin inhibitor (See U.S. Ser. No. 
08/740,682 filed November 1, 1996 and PCT/US97/20441 filed October 31, 1997) 
are constructed by insertion of the corresponding coding sequences between the 
promoter and terminator of the 27kD gamma zein gene, the globulinl gene and the 
waxy gene, respectively. Resulting chimeric genes are for example gz::ESA::gz and 
gz::BHL::gz, designated as PHP11260 (Seq. ID 6) and as PHP11427 (Seq. ID 7), 
respectively. 

The resulting expression vectors are used in conjunction with the selectable 
marker expression cassettes PHP3528 (enhanced CAMV::Bar::Pinll) for particle 
bombardment transformation of maize immature embryos. 
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EXAMPLE 3 

Preparation of Transgenic Plants 

The general method of genetic transformation used to produce transgenic 
maize plants is mediated by bombardment of embryogenically responsive immature 
embryos with tungsten particles associated with DNA plasmids, said plasmids 
consisting of a selectable and an unselectable marker gene. 
Preparation of Tissue 

Immature embryos of "High Type II" are the target for particle bombardment- 
mediated transformation. This genotype is the F n of two purebred genetic lines, 
parent A and parent B, derived from A188 X B73. Both parents are selected for high 
competence of somatic embryogenesis. See Armstrong, et a/., "Development and 
Availability of Germplasm with High Type II Culture Formation Response," Majze 
Genetics Cooperation Newsletter . Vol. 65, pp. 92 (1991); incorporated herein in its 
entirety by reference. 

Ears from F : plants are selfed or sibbed, and embryos are aseptically 
dissected from developing caryopses when the scutellum first becomes opaque. 
The proper stage occurs about 9-13 days post-pollination, and most generally about 
10 days post-pollination, and depends on growth conditions. The embryos are 
about 0.75 to 1.5 mm long. Ears are surface sterilized with 20-50% Clorox for 30 
min, followed by 3 rinses with sterile distilled water. 

Immature embryos are cultured, scutellum oriented upward, on embryogenic 
induction medium comprised of N6 basal salts (Chu, et a/., "Establishment of an 
Efficient Medium for Anther Culture of Rice Through Comparative Experiments on 
the Nitrogen Sources," Scientia Sinica , (Peking), Vol. 18, pp. 659-668 (1975); 
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incorporated herein in its entirety by reference; Eriksson vitamins (See Eriksson, T., 
"Studies on the Growth Requirements and Growth Measurements of Haplopappus 
gracilis ," Phvsioi. Plant . Vol. 18, pp. 976-993 (1965); incorporated herein in its 
entirety by reference), 0.5 mg/l thiamine HCI, 30 gm/l sucrose, 2.88 gm/l L-proline, 1 
mg/l 2,4-dichlorophenoxyacetic acid, 2 gm/l Gelrite, and 8.5 mg/l AgN0 3 . 

The medium is sterilized by autoclaving at 121 °C for 15 min and dispensed 
into 100 X 25 mm petri dishes. AgN0 3 is filter-sterilized and added to the medium 
after autoclaving. The tissues are cultured in complete darkness at 28°C. After 
about 3 to 7 days, generally about 4 days, the scutellum of the embryo has swelled 
to about double its original size and the protuberances at the coleorhizal surface of 
the scutellum indicate the inception of embryogenic tissue. Up to 100% of the 
embryos display this response, but most commonly, the embryogenic response 
frequency is about 80%. 

When the embryogenic response is observed, the embryos are transferred to 
a medium comprised of induction medium modified to contain 120 gm/l sucrose. 
The embryos are oriented with the coleorhizal pole, the embryogenically responsive 
tissue, upwards from the culture medium. Ten embryos per petri dish are located in 
the center of a petri dish in an area about 2 cm in diameter. The embryos are 
maintained on this medium for 3-16 hr, preferably 4 hours, in complete darkness at 
28°C just prior to bombardment with particles associated with plasmid DNAs 
containing the selectable and unselectable marker genes. 

To effect particle bombardment of embryos, the particle-DNA agglomerates 
are accelerated using a DuPont PDS-1000 particle acceleration device. The 
particle-DNA agglomeration is briefly sonicated and 10 jal are deposited on 
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macrocarriers and the ethanol allowed to evaporate. The macrocarrier is 
accelerated onto a stainless-steel stopping screen by the rupture of a polymer 
diaphragm (rupture disk). Rupture is effected by pressurized helium. Depending on 
the rupture disk breaking pressure, the velocity of particle-DNA acceleration may be 
varied. Rupture disk pressures of 200 to 1800 psi are commonly used, with those of 
650 to 1100 psi being more preferred, and about 900 psi being most highly 
preferred. Rupture disk breaking pressures are additive so multiple disks may be 
used to effect a range of rupture pressures. 

Preferably, the shelf containing the plate with embryos is 5.1 cm below the 
bottom of the macrocarrier platform (shelf #3), but may be located at other 
distances. To effect particle bombardment of cultured immature embryos, a rupture 
disk and a macrocarrier with dried particle-DNA agglomerates are installed in the 
device. The He pressure delivered to the device is adjusted to 200 psi above the 
rupture disk breaking pressure. A petri dish with the target embryos is placed into 
the vacuum chamber and located in the projected path of accelerated particles. A 
vacuum is created in the chamber, preferably about 28 inches Hg. After operation of 
the device, the vacuum is released and the petri dish is removed. 

Bombarded embryos remain on the osmotically adjusted medium during 
bombardment, and preferably for two days subsequently, although the embryos may 
remain on this medium for 1 to 4 days. The embryos are transferred to selection 
medium comprised of N6 basal salts, Eriksson vitamins, 0.5 mg/l thiamine HCI, 30 
gm/l sucrose, 1 mg/l 2,4-dichlorophenoxyacetic acid, 2 gm/l Gelrite, 0.85 mg/l 
AgN0 3 and 3 mg/l bialaphos. Bialaphos is added filter-sterilized. The embryos are 
subcultured to fresh selection medium at 10 to 14 day intervals. After about 7 weeks, 
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embryogenic tissue, putatively transgenic for both selectable and unselected marker 
genes, is seen to proliferate from about 7% of the bombarded embryos. Putative 
transgenic tissue is rescued, and that tissue derived from individual embryos is 
considered to be an event and is propagated independently on selection medium. 
Two cycles of clonal propagation is achieved by visual selection for the smallest 
contiguous fragments of organized embryogenic tissue. 

For regeneration of transgenic plants, embryogenic tissue is subcultured to 
medium comprised of MS salts and vitamins (Murashige, T. and F. Skoog, "A 
revised medium for rapid growth and bio assays with tobacco tissue cultures"; 
Phvsiologia Plantarum ; Vol. 15; pp. 473-497; 1962; incorporated herein in its entirety 
by reference), 100 mg/l myo-inositol, 60 gm/l sucrose, 3 gm/l Gelrite, 0.5 mg/l zeatin, 
1 mg/l indole-3-acetic acid, 26.4 ng/l cis-trans-abscissic acid, and 3 mg/l bialaphos in 
100 X 25 mm petri dishes and incubated in darkness at 28°C until the development 
of well-formed, matured somatic embryos can be visualized. This requires about 14 
days. 

Well-formed somatic embryos are opaque and cream-colored, and are 
comprised of an identifiable scutellum and coleoptile. The embryos are individually 
subcultured to germination medium comprised of MS salts and vitamins, 100 mg/l 
myo-inositol, 40 gm/l sucrose and 1.5 gm/l Gelrite in 100 X 25 mm petri dishes and 
incubated under a 16 hr light: 8 hr dark photoperiod and 40 (iEinsteinsm" 2 sec" 1 from 
cool-white fluorescent tubes. After about 7 days, the somatic embryos have 
germinated and produced a well-defined shoot and root. The individual plants are 
subcultured to germination medium in 125 x 25 mm glass tubes to allow further plant 
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development. The plants are maintained under a 16 hr light: 8 hr dark photoperiod 

and 40 |xEinsteinsm" 2 sec" 1 from cool-white fluorescent tubes. 

After about 7 days, the plants are well-established and are transplanted to 

horticultural soil, hardened off, and potted into commercial greenhouse soil mixture 

and grown to sexual maturity in a greenhouse. An elite inbred line is used as a male 

to pollinate regenerated transgenic plants. 

Preparation of Particles 

Fifteen mg of tungsten particles (General Electric) , 0.5 to 1.8 nm, preferably 

1 to 1 .8 |xm, and most preferably 1 are added to 2 ml of concentrated nitric acid. 

This suspension is sonicated at 0°C for 20 min (Branson Sonifier Model 450, 40% 

output, constant duty cycle). Tungsten particles are pelleted by centrifugation at 
10,000 rpm (Biofuge) for 1 min and the supernatant is removed. Two ml of sterile 
distilled water is added to the pellet and sonicate briefly to resuspend the particles. 
The suspension is pelleted, 1 ml of absolute ethanol is added to the pellet and 
sonicated briefly to resuspend the particles. Rinse, pellet, and resuspend the 
particles a further 2 times with sterile distilled water, and finally resuspend the 
particles in 2 ml of sterile distilled water. The particles are subdivided into 250 jJ 
aliquots and stored frozen. 
Preparation of particle-plasmid DNA association 

The stock of tungsten particles is sonicated briefly in a water bath sonicator 
(Branson Sonifier Model 450, 20% output, constant duty cycle) and 50 is 
transferred to a microfuge tube. Plasmid DNA is added to the particles for a final 
DNA amount of 0.1 to 10 jxg in 10 p\ total volume, and briefly sonicated. Preferably 
1 ng total DNA is used. Specifically, 5 |xl of PHP8001 (gz::HT12::gz) and 5\il of 
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PHP3528 (enhanced CAMV::Bar::Pinll) at 0.1 ng/^il in TE buffer, are added to the 
particle suspension. Fifty \i\ of sterile aqueous 2.5 M CaCI 2 are added, and the 
mixture is briefly sonicated and vortexed. Twenty ^il of sterile aqueous 0.1M 
spermidine are added and the mixture is briefly sonicated and vortexed. The 
mixture is incubated at room temperature for 20 min with intermittent brief 
sonication. The particle suspension is centrifuged, and the supernatant is removed. 
Two hundred fifty |xl of absolute ethanol is added to the pellet and briefly sonicated. 
The suspension is pelleted, the supernatant is removed, and 60 pJ of absolute 
ethanol is added. The suspension is sonicated briefly before loading the particle- 
DNA agglomeration onto macrocarriers. 
EXAMPLE 4 

Analysis of seed from transgenic plants for recombinant polypeptides having 
an elevated level of a preselected amino acid. 

Preparation of meals from corn seed 

Pooled or individual dry seed harvested from transformed plants from the 
greenhouse or the field are prepared in one of the following ways: 

A. Seed is imbibed in sterile water overnight (16-20 hr) at 4°C. The imbibed 
seed is dissected into embryo, endosperm and pericarp. The embryos and 
endosperm are separately frozen in liquid N 2 , the pericarps are discarded. 
Frozen tissue is ground with a liquid N 2 chilled ceramic mortar and pestle to a 
fine meal. The meals are dried under vacuum and stored at -20°C or -80°C. 
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B. Dry whole seed is ground to a fine meal with a ball mill (Klecko), or by hand 
with a ceramic mortar and pestle. For analysis of endosperm only, the 
embryos are removed with a drill and discarded. The remaining endosperm 
with pericarp is ground with a ball mill or a mortar and pestle. 
EL1SA analysis 

Rabbit polyclonal anti HT12 antisera are produced with synthetic HT12 (See 
Rao et al. supra) at Bethyl laboratories. An HT12 ELISA assay is developed and 
performed by the Analytical Biochemistry department of Pioneer Hi-Bred International, 
Inc., essentially as described by Harlow and Lane, Antibodies, A Laboratory Manual, 
Cold Springs Harbor Publication, New York (1988). Quantitative ELISA assays are 
first performed on pooled meals to identify positive events. Positive events are further 
analyzed by quantitative ELISA on individual kernels to determine the relative level of 
HT12 expression and transgene segregation ratio. Among 97 events tested, 59 show 
HT1 2 expression levels >1 000 ppm. The highest events have HT1 2 expression levels 
at 2-5% of the total seed protein. Typical results for HT12 levels for whole kernels of 
wild type corn, for one event (TC2031) of corn transformed with the gz::HT12::gz 
chimeric gene, expressing HT12 in the endosperm, for one event (TC320) of corn 
transformed with the wx::HT12::wx chimeric gene, expressing HT12 in the endosperm, 
and for one event (TC2027) of corn transformed with the glb1::HT12::glb1 chimeric 
gene, expressing HT12 in the embryo, are in Table 1. 

Similarly, antisera are produced, ELISA assays are developed and assays of 
seed from transformed plants are performed for other derivatives of hordothionine 
(See Rao et al. 1994 Protein Engineering 7(12):1485-1493, and WO 94/16078 
published July 21, 1994), for the enhanced soybean albumin (ESA) (See U.S. Ser. 
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No. 08/618,911) and for BHL and other derivatives of the barley chymotrypsin 
inhibitor (See U.S. Ser. No. 08/740,682 filed November 1, 1996 and 
PCT/US97/20441 filed October 31, 1997), respectively. 

Polvacrvlamideael and immuno blot analysis 
5 SDS extracts of meals, molecular weight markers, and a synthetic HT1 2 positive 

control (see Rao et al. supra) are separated on 16.5% or 8-22% polyacrylamide 
gradient Tris-Tricine gels (Schagger, H. and Von Jagow, G. 1987 Anal. Biochem. , 
166:368). For immuno blot analysis, gels are transferred to PVDF membranes in 100 
mM CAPS, pH 11; 10% methanol using a semidry blotter (Hoefer, San Francisco, CA). 
| 10 After transfer the membrane is blocked in BLOTTO (4% dry milk in Tris-buffered 
| saline, pH 7.5) (Johnson, D. A. , Gausch, J. W., Sportsman, J. R., and Elder, J. H. 

M 1984, Gene Anal. Techn ., 1:3). The blots are incubated with rabbit anti-HT12 (same 

^ as used for ELISA) diluted 1:2000 to 1:7500 in BLOTTO 2 hr at room temperature 

5} (22°C) or overnight at 4°C. Blots are washed 4-5X with BLOTTO, then incubated 1-2 

5 15 hr with horseradish peroxidase-goat anti-rabbit IgG (Promega, Madison, Wl) diluted 
1:7500 to 1:15000 in BLOTTO. After secondary antibody, the blots are washed 3X 
with BLOTTO followed by 2 washes with Tris-buffered saline, pH 7.5. Blots are briefly 
incubated with enhanced chemiluminescence (ECL, Amersham, Arlington Heights, IL) 
substrate, and wrapped in plastic wrap. Reactive bands are visualized after exposure 
20 to x-ray film (Kodak Biomax MR) after short exposure times ranging from 5-1 20 sec. 

HT1 2 transgenic seed shows a distinctive band not seen in wild type seed at the 
correct molecular weight and position as judged by the HT12 positive control standard 
and molecular weight markers. These results indicate that the expressed HT12 
prepropeptideis being correctly processed like native HT in barley. Novel polypeptide 
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bands co-migrating with the HT12 positive control are also observed in Coomassie 
stained polyacrylamide gels loaded with 10mg total extracted protein indicating 
substantial expression and accumulation of HT12 protein in the seed. 

Similarly, other derivatives of hordothionin, soybean albumin, the enhanced 
5 soybean albumin (ESA), BHL and other derivatives of the barley chymotrypsin 
inhibitor are detected by polyacrylamide gel and immuno blot analysis. 
Amino acid composition analysis 

Meals from seed, endosperm or embryo that express a recombinant polypeptide 
having an elevated level of a preselected amino acid are sent to the University of 
S 10 Iowa Protein Structure Facility for amino acid composition analysis using standard 

Pi protocols for digestion and analysis. 

Q 

y! Typical results for the amino acid composition of whole kernels of wild type corn, 

? for one event (TC2031) of corn transformed with the gz::HT12::gz chimeric gene, 

| expressing HT12 in the endosperm, for one event (TC320) of corn transformed with 

S 15 the wx::HT12::wx chimeric gene, expressing HT12 in the endosperm, and for one 

event (TC2027) of corn transformed with the glb1::HT12::glb1 chimeric gene, 

expressing HT12 in the embryo, are in Table 1 . 
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Table 1: HT12 ELISA analysis and amino acid composition of meal from whole 
kernels from wild type corn and from transformed corn expressing recombinant 



HT12. 



transgene none 


wx::HT12::wx 


gz::HT12::gz 


glb1::HT12::glb1 


event 


wild-type 


TC320 


TC2031 


TC2027 


EM IQA 
CLIOM 

HT 12 


protein ppm 
0.00 


protein ppm 
6200 


protein ppm 
8000 


protein ppm 
22600 


AA 

y3 Lys 
i=f Arg 
Pi Cys 

Si 


Meal % 

n=3 

0.29 

0.52 

0.12 


Meal % 

n=2 

0.38 

0.58 

0.19 


Meal % 

n=3 

0.39 

0.56 

0.17 


Meal % 

n=4 

0.24 

0.45 

0.22 



bi 5 The results in Table 1 demonstrate corn expressing recombinant HT12 in the 

5 endosperm shows a significant increase of the preselected amino acid lysine. 

o 

S Table 2: SEQUENCE INFORMATION 



SEQUENCE ID 


PROMOTER 


GENE 


Seq. 1: pBSKP-HT 


None 


3361-2947 


Seq. 2: pBSKP-HT12 


None 


3361-2947 


Seq. 3: PHP8001gz::HT12::gz expression vector 


676-2198 


2199-2612 


Seq. 4: PHP7999 glb1::HT12::glb1 expression vector 


3271-1834 


1834-1420 


Seq. 5: PHP5025 wx::HT::wx expression vector 


43-1342 


1343-1757 


Seq. 6: PHP 11260 gz::ESA::gz expression vector 


676-2198 


2199-2675 


Seq. 7: PHP11427 gz::BHL::gz 


676-2198 


2199-2450 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION 

(i) APPLICANT: Jung, Rudolf 

Beach, Larry R. 
Dress, Virginia M. 
Rao, A. Gururaj 
Ranch, Jerome P. 
Ertl, David S. 
Higgins, Regina K. 

(ii) TITLE OF THE INVENTION: Alteration of Amino Acid Compos 

in Seeds 

(iii) NUMBER OF SEQUENCES: 13 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Pioneer Hi-Bred International, Inc. 

(B) STREET: 7100 NW 62nd Avenue, P.O. Box 1000 

(C) CITY: Johnston 

(D) STATE: I A 

(E) COUNTRY: USA 

(F) ZIP: 50131 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: DOS 

(D) SOFTWARE: FastSEQ for Windows Version 2.0 

(vi) CURRENT APPLICATION DATA: 
(A) APPLICATION NUMBER: 

<B) FILING DATE: 
(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 



(viii) ATTORNEY / AGENT INFORMATION: 

(A) NAME: Michel, Marianne H 

(B) REGISTRATION NUMBER: 35,286 

(C) REFERENCE /DOCKET NUMBER: 0815 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 515-334-4467 

(B) TELEFAX: 515-334-6883 

(C) TELEX: 



(2) INFORMATION FOR SEQ ID NO : 1 : 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3363 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 



(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

TCGACCTCGA GGGGGGGCCC GGTACCCAGC TTTTGTTCCC TTTAGTGAGG GTTAATTGCG 60 

CGCTTGGCGT AATCATGGTC ATAGCTGTTT CCTGTGTGAA ATTGTTATCC GCTCACAATT 120 

CCACACAACA TACGAGCCGG AAGCATAAAG TGTAAAGCCT GGGGTGCCTA ATGAGTGAGC 18 0 

TAACTCACAT TAATTGCGTT GCGCTCACTG CCCGCTTTCC AGTCGGGAAA CCTGTCGTGC 24 0 

CAGCTGCATT AATGAATCGG CCAACGCGCG GGGAGAGGCG GTTTGCGTAT TGGGCGCTCT 300 

TCCGCTTCCT CGCTCACTGA CTCGCTGCGC TCGGTCGTTC GGCTGCGGCG AGCGGTATCA 360 

GCTCACTCAA AGGCGGTAAT ACGGTTATCC ACAGAATCAG GGGATAACGC AGGAAAGAAC 420 

ATGTGAGCAA AAGGCCAGCA AAAGGCCAGG AACCGTAAAA AGGCCGCGTT GCTGGCGTTT 480 

TTCCATAGGC TCCGCCCCCC TGACGAGCAT CACAAAAATC GACGCTCAAG TCAGAGGTGG 540 

CGAAACCCGA CAGGACTATA AAG AT AC C AG GCGTTTCCCC CTGGAAGCTC CCTCGTGCGC 600 

TCTCCTGTTC CGACCCTGCC GCTTACCGGA TACCTGTCCG CCTTTCTCCC TTCGGGAAGC 660 

GTGGCGCTTT CTCATAGCTC ACGCTGTAGG TATCTCAGTT CGGTGTAGGT CGTTCGCTCC 720 

AAGCTGGGCT GTGTGCACGA ACCCCCCGTT CAGCCCGACC GCTGCGCCTT ATCCGGTAAC 78 0 

TATCGTCTTG AGTCCAACCC GGTAAGACAC GACTTATCGC CACTGGCAGC AGCCACTGGT 84 0 

AACAGGATTA GCAGAGCGAG GTATGTAGGC GGTGCTACAG AGTTCTTGAA GTGGTGGCCT 90 0 

AACTACGGCT ACACTAGAAG GACAGTATTT GGTATCTGCG CTCTGCTGAA GCCAGTTACC 960 

TTCGGAAAAA GAGTTGGTAG CTCTTGATCC GGCAAACAAA CCACCGCTGG TAGCGGTGGT 1020 

TTTTTTGTTT GCAAGCAGCA GATTACGCGC AGAAAAAAAG GATCTCAAGA AGATC CTTTG 1080 

ATCTTTTCTA CGGGGTCTGA CGCTCAGTGG AACGAAAACT CACGTTAAGG GATTTTGGTC 1140 

ATGAGATTAT CAAAAAGGAT CTTCACCTAG ATCCTTTTAA ATTAAAAATG AAGTTTTAAA 12 00 

TCAATCTAAA GTATATATGA GTAAACTTGG TCTGACAGTT ACCAATGCTT AATCAGTGAG 12 60 

GCACCTATCT CAGCGATCTG TCTATTTCGT TCATC CATAG TTGCCTGACT CCCCGTCGTG 132 0 

TAGATAACTA CGATACGGGA GGGCTTACCA TCTGGCCCCA GTGCTGCAAT GATACCGCGA 13 80 

GACCCACGCT CACCGGCTCC AGATTTATCA GCAATAAACC AGCCAGCCGG AAGGGCCGAG 1440 

CGCAGAAGTG GTCCTGCAAC TTTATCCGCC TCCATCCAGT CTATTAATTG TTGCCGGGAA 1500 

GCTAGAGTAA GTAGTTCGCC AGTTAATAGT TTGCGCAACG TTGTTGCCAT TGCTACAGGC 1560 

ATCGTGGTGT CACGCTCGTC GTTTGGTATG GCTTCATTCA GCTCCGGTTC CCAACGATCA 162 0 

AGGCGAGTTA CATGATCCCC CATGTTGTGC AAAAAAGCGG TTAGCTCCTT CGGTCCTCCG 168 0 

ATCGTTGTCA GAAGTAAGTT GGCCGCAGTG TTATCACTCA TGGTTATGGC AGCACTGCAT 1740 

AATTCTCTTA CTGTCATGCC ATCCGTAAGA TGCTTTTCTG TGACTGGTGA GTACTCAACC 18 00 

AAGTCATTCT GAGAATAGTG TATGCGGCGA CCGAGTTGCT CTTGCCCGGC GTCAATACGG 1860 

GATAATACCG CGC CAC AT AG CAGAACTTTA AAAGTGCTCA TCATTGGAAA ACGTTCTTCG 192 0 

GGGCGAAAAC TCTCAAGGAT CTTACCGCTG TTGAGATC C A GTTCGATGTA ACCCACTCGT 1980 

GCACCCAACT GATCTTCAGC ATCTTTTACT TTCACCAGCG TTTCTGGGTG AGCAAAAACA 2040 

GGAAGGCAAA ATGCCGCAAA AAAGGGAATA AGGGCGACAC GGAAATGTTG AATACTCATA 210 0 

CTCTTCCTTT TTCAATATTA TTGAAGCATT TATCAGGGTT ATTGTCTCAT GAGCGGATAC 216 0 

ATATTTGAAT GTATTTAGAA AAATAAACAA ATAGGGGTTC CGCGCACATT TCCCCGAAAA 222 0 

GTGCCACCTA AATTGTAAGC GTTAATATTT TGTTAAAATT CGCGTTAAAT TTTTGTTAAA 228 0 

TCAGCTCATT TTTTAACCAA TAGGCCGAAA TCGGCAAAAT CCCTTATAAA TCAAAAGAAT 2340 

AGACCGAGAT AGGGTTGAGT GTTGTTCCAG TTTGGAACAA GAGTC CACTA TTAAAGAACG 24 00 

TGGACTCCAA CGTCAAAGGG CGAAAAACCG TCTATCAGGG CGATGGCCCA CTACGTGAAC 2460 

CATCACCCTA ATCAAGTTTT TTGGGGTCGA GGTGCCGTAA AGCACTAAAT CGGAAC CCTA 2 520 

AAGGGAGCCC CCGATTTAGA GCTTGACGGG GAAAGCCGGC GAACGTGGCG AGAAAGGAAG 2 580 

GGAAGAAAGC GAAAGGAGCG GGCGCTAGGG CGCTGGCAAG TGTAGCGGTC ACGCTGCGCG 264 0 

TAACCACCAC ACCCGCCGCG CTTAATGCGC CGCTACAGGG CGCGTCCCAT TCGC CATTCA 2700 

GGCTGCGCAA CTGTTGGGAA GGGCGATCGG TGCGGGCCTC TTCGCTATTA CGCCAGCTGG 276 0 

CGAAAGGGGG ATGTGCTGCA AGGCGATTAA GTTGGGTAAC GCCAGGGTTT TCCCAGTCAC 2 820 

GACGTTGTAA AACGACGGCC AGTGAGCGCG CGTAATACGA CTCACTATAG GGCGAATTGG 2 88 0 

AGCTCCACCG CGGTGGCGGC CGCTCTAGAA CTAGTGGATC CGTCGACTAG AGGGCCCGAC 2 940 

GTCGAACTTA GGCACTAAGG GATGTGAGGC CAGCATCACC GTTGCAGAAA TTGACACAAG 3 000 

CATCACCACA ATTTTCCAAA TAGAGTTTCA TTTCTTCGTC GTCAGCAGCT GCGTTGACCA 306 0 
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TGTAGTCACA CATGGAAGCC CTACACCCCA AGTTGCAATA CTTGACGGTG TCTGGTTCAT 3120 

CTGAGTTGGA CACAAGGGCC AATTTGGGGA AGCCTGTAGG GCATTTTCCG CTACTTGTGA 318 0 

GTTTACACCT ACAGACGCCT GCGCATAACT TCTGAGCACC ACGGACGCGG CAAAGGTTGT 3240 

AGCAGTTTCT TCCTAGGGTG CTCCTGCAGC AACTCTTGCC TTCTACTTGC ACCTGTTCGA 33 00 

GAACCAACCC CAGTATAAGT AAACACACCA TCACACCCTT GAGGCCCTTG CTGGTGGCCA 3360 



(2) INFORMATION FOR SEQ ID NO : 2 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3365 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

TCGACCTCGA GGGGGGGCCC GGTACCCAGC TTTTGTTCCC TTTAGTGAGG GTTAATTGCG 60 

CGCTTGGCGT AATCATGGTC ATAGCTGTTT CCTGTGTGAA ATTGTTATCC GCTCACAATT 12 0 

CCACACAACA TACGAGCCGG AAGCATAAAG TGTAAAGCCT GGGGTGCCTA ATGAGTGAGC 180 

TAACTCACAT TAATTGCGTT GCGCTCACTG CCCGCTTTCC AGTCGGGAAA CCTGTCGTGC 24 0 

CAGCTGCATT AATGAATCGG CCAACGCGCG GGGAGAGGCG GTTTGCGTAT TGGGCGCTCT 30 0 

TCCGCTTCCT CGCTCACTGA CTCGCTGCGC TCGGTCGTTC GGCTGCGGCG AGCGGTATCA 36 0 

GCTCACTCAA AGGCGGTAAT ACGGTTATCC ACAGAATCAG GGGATAACGC AGGAAAGAAC 420 

ATGTGAGCAA AAGGC CAGCA AAAGGCCAGG AACCGTAAAA AGGCCGCGTT GCTGGCGTTT 4 80 

TTCCATAGGC TCCGCCCCCC TGACGAGCAT CACAAAAATC GACGCTCAAG TCAGAGGTGG 540 

CGAAACC CGA CAGGACTATA AAGATACCAG GCGTTTCCCC CTGGAAGCTC CCTCGTGCGC 600 

TCTCCTGTTC CGACCCTGCC GCTTACCGGA TACCTGTCCG CCTTTCTCCC TTCGGGAAGC 66 0 

GTGGCGCTTT CTCATAGCTC ACGCTGTAGG TATCTCAGTT CGGTGTAGGT CGTTCGCTCC 72 0 

AAGCTGGGCT GTGTGCACGA ACCCCCCGTT CAGCCCGACC GCTGCGCCTT ATCCGGTAAC 780 

TATCGTCTTG AGTCCAACCC GGTAAGACAC GACTTATCGC CACTGGCAGC AGCCACTGGT 840 

AACAGGATTA GCAGAGCGAG GTATGTAGGC GGTGCTACAG AGTTCTTGAA GTGGTGGCCT 900 

AACTACGGCT ACACTAGAAG GACAGTATTT GGTATCTGCG CTCTGCTGAA GCCAGTTACC 96 0 

TTCGGAAAAA GAGTTGGTAG CTCTTGATCC GGCAAACAAA CCACCGCTGG TAGCGGTGGT 102 0 

TTTTTTGTTT GCAAGCAGCA GATTACGCGC AGAAAAAAAG GATCTCAAGA AGATCCTTTG 1080 

ATCTTTTCTA CGGGGTCTGA CGCTCAGTGG AACGAAAACT CACGTTAAGG GATTTTGGTC 114 0 

ATGAGATTAT CAAAAAGGAT CTTCACCTAG ATCCTTTTAA ATTAAAAATG AAGTTTTAAA 12 00 

TCAATCTAAA GTATATATGA GTAAACTTGG TCTGACAGTT ACCAATGCTT AATCAGTGAG 1260 

GCACCTATCT CAGCGATCTG TCTATTTCGT TCATCCATAG TTGCCTGACT CCCCGTCGTG 1320 

TAGATAACTA CGATACGGGA GGGCTTACCA TCTGGCCCCA GTGCTGCAAT GATACCGCGA 13 80 

GACCCACGCT CACCGGCTCC AGATTTATCA GCAATAAACC AGCCAGCCGG AAGGGCCGAG 1440 

CGCAGAAGTG GTCCTGCAAC TTTATCCGCC TCCATCCAGT CTATTAATTG TTGCCGGGAA 1500 

GCTAGAGTAA GTAGTTCGCC AGTTAATAGT TTGCGCAACG TTGTTGC CAT TGCTACAGGC 156 0 

ATCGTGGTGT CACGCTCGTC GTTTGGTATG GCTTCATTCA GCTCCGGTTC CCAACGATCA 162 0 

AGGCGAGTTA CATGATCCCC CATGTTGTGC AAAAAAGCGG TTAGCTCCTT CGGTCCTCCG 1680 

ATCGTTGTCA GAAGTAAGTT GGCCGCAGTG TTATCACTCA TGGTTATGGC AGCACTGCAT 1740 

AATTCTCTTA CTGTCATGCC ATCCGTAAGA TGCTTTTCTG TGACTGGTGA GTACTCAACC 18 00 

AAGTCATTCT GAGAATAGTG TATGCGGCGA CCGAGTTGCT CTTGCCCGGC GTCAATACGG 18 60 

GATAATACCG CGCCACATAG CAGAACTTTA AAAGTGCTCA TCATTGGAAA ACGTTCTTCG 192 0 

GGGCGAAAAC TCTCAAGGAT CTTACCGCTG TTGAGATCCA GTTCGATGTA ACCCACTCGT 198 0 

GCACCCAACT GATCTTCAGC ATCTTTTACT TTCACCAGCG TTTCTGGGTG AGCAAAAACA 2 040 

GGAAGGCAAA ATGCCGCAAA AAAGGGAATA AGGGCGACAC GGAAATGTTG AATACTCATA 2100 

CTCTTCCTTT TTCAATATTA TTGAAGCATT TATCAGGGTT ATTGTCTCAT GAGCGGATAC 2160 

ATATTTGAAT GTATTTAGAA AAATAAACAA ATAGGGGTTC CGCGCACATT TCCCCGAAAA 2220 

GTGCCACCTA AATTGTAAGC GTTAATATTT TGTTAAAATT CGCGTTAAAT TTTTGTTAAA 228 0 

TCAGCTCATT TTTTAACCAA TAGGCCGAAA TCGGCAAAAT CCCTTATAAA TCAAAAGAAT 234 0 
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AGACCGAGAT AGGGTTGAGT GTTGTTCCAG TTTGGAACAA GAGTCCACTA TTAAAGAACG 24 0 0 

TGGACTCCAA CGTCAAAGGG CGAAAAACCG TCTATCAGGG CGATGGCCCA CTACGTGAAC 2460 

CATCACCCTA ATCAAGTTTT TTGGGGTCGA GGTGCCGTAA AGCACTAAAT CGGAACCCTA 2 52 0 

AAGGGAGCCC CCGATTTAGA GCTTGACGGG GAAAGCCGGC GAACGTGGCG AGAAAGGAAG 2 580 

GGAAGAAAGC GAAAGGAGCG GGCGCTAGGG CGCTGGCAAG TGTAGCGGTC ACGCTGCGCG 2640 

TAACCACCAC ACCCGCCGCG CTTAATGCGC CGCTACAGGG CGCGTCCCAT TCGCCATTCA 27 0 0 

GGCTGCGCAA CTGTTGGGAA GGGCGATCGG TGCGGGCCTC TTCGCTATTA CGCCAGCTGG 2760 

CGAAAGGGGG ATGTGCTGCA AGGCGATTAA GTTGGGTAAC GCCAGGGTTT TCCCAGTCAC 2 82 0 

GACGTTGTAA AACGACGGCC AGTGAGCGCG CGTAATACGA CTCACTATAG GGCGAATTGG 2880 

AGCTCCACCG CGGTGGCGGC CGCTCTAGAA CTAGTGGATC CGTCGACTAG AGGGC CCGAC 2940 

GTCGAACTTA GGCACTAAGG GATGTGAGGC CAGCATCACC GTTGCAGAAA TTGACACAAG 3000 

CATCACCACA ATTTTCCAAA TAGAGTTTCA TTTCTTCGTC GTCAGCAGCT GCGTTGACCA 3 06 0 

TGTAGTCACA CATGGAAGCC CTACACCCCA AGTTGCAATA CTTGACGGTG TCTGGTTCAT 312 0 

CTGAGTTGGA C AC AAGGGC C AATTTGGGGA AGCCTTTCGG GCATTTTCCG CTACTAGTCA 3180 

GCTTACACTT GCAGACGCCT GCGCAAAGCT TCTTGGCGCC TTTGACTTTG CAAAGGTTGT 3240 

AGCACTTCCT TCCCAGGGTA CTCTTGCAGC AACTCTTGCC TTCTACTTGC ACCTGTTCGA 3300 

GAACCAACCC CAGTATAAGT AAACACACCA TCACACCCTT GAGGCCCTTG CTGGTGGCCA 33 6 0 



(2) INFORMATION FOR SEQ ID NO : 3 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 536 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

CTAAATTGTA AGCGTTAATA TTTTGTTAAA ATTCGCGTTA AATTTTTGTT AAATCAGCTC 6 0 

ATTTTTTAAC CAATAGGCCG AAATCGGCAA AATCCCTTAT AAATCAAAAG AATAGACCGA 12 0 

GATAGGGTTG AGTGTTGTTC CAGTTTGGAA CAAGAGTCCA CTATTAAAGA ACGTGGACTC 180 

CAACGTCAAA GGGCGAAAAA CCGTCTATCA GGGCGATGGC CCACTACGTG AACCATCACC 240 

C TAATC AAGT TTTTTGGGGT CGAGGTGCCG TAAAGCACTA AATCGGAACC CTAAAGGGAG 300 

CCCCCGATTT AGAGCTTGAC GGGGAAAGCC GGCGAACGTG GCGAGAAAGG AAGGGAAGAA 36 0 

AGCGAAAGGA GCGGGCGCTA GGGCGCTGGC AAGTGTAGCG GTCACGCTGC GCGTAACCAC 42 0 

CACACCCGCC GCGCTTAATG CGCCGCTACA GGGCGCGTCC CATTCGCCAT TCAGGCTGCG 480 

CAACTGTTGG GAAGGGCGAT CGGTGCGGGC CTCTTCGCTA TTACGCCAGC TGGCGAAAGG 540 

GGGATGTGCT GCAAGGCGAT TAAGTTGGGT AACGCCAGGG TTTTCCCAGT CACGACGTTG 60 0 

TAAAACGACG GCCAGTGAGC GCGCGTAATA CGACTCACTA TAGGGCGAAT TGGAGCTCCA 660 

CCGCGGTGGC GGCCGCTCTA GATTATATAA TTTATAAGCT AAACAACCCG GCCCTAAAGC 720 

ACTATCGTAT CACCTATCTA AATAAGTCAC GGGAGTTTCG AACGTCCACT TCGTCGCACG 780 

GAATTGCATG TTTCTTGTTG GAAGCATATT CACGCAATCT CCACACATAA AGGTTTATGT 84 0 

ATAAACTTAC ATTTAGCTCA GTTTAATTAC AGTCTTATTT GGATGCATAT GTATGGTTCT 900 

CAATCCATAT AAGTTAGAGT AAAAAATAAG TTTAAATTTT ATCTTAATTC ACTCCAACAT 96 0 

ATATGGATCT ACAATACTCA TGTGCATCCA AACAAACTAC TTATATTGAG GTGAATTTGG 1020 

TAGAAATTAA ACTAACTTAC ACACTAAGCC AATCTTTACT ATATTAAAGC AC C AGTTTC A 1080 

ACGATCGTCC CGCGTCAATA TTATTAAAAA ACTCCTACAT TTCTTTATAA TCAACCCGCA 114 0 

CTCTTATAAT CTCTTCTCTA CTACTATAAT AAGAGAGTTT ATGTACAAAA TAAGGTGAAA 12 00 

TTATCTATAA GTGTT CTGGA TATTGGTTGT TGGCTCCCAT ATTCACACAA CCTAATCAAT 1260 

AGAAAACATA TGTTTTATTA AAACAAAATT TATCATATAT CATATATATA TATATATCAT 1320 

ATATATATAT AAACCGTAGC AATGCACGGG CATATAACTA GTGCAACTTA ATACATGTGT 1380 

GTATTAAGAT GAATAAGAGG GTATCCAAAT AAAAAACTTG TTGCTTACGT ATGGATCGAA 144 0 

AGGGGTTGGA AACGATTAAA CGATTAAATC TCTTCCTAGT CAAAATTGAA TAGAAGGAGA 1500 

TTTAATATAT CCCAATCCCC TTCGATCATC CAGGTGCAAC CGTATAAGTC CTAAAGTGGT 156 0 

GAGGAACACG AAAGAACCAT GCATTGGCAT GTAAAGCTCC AAGAATTTGT TGTATCCTTA 1620 
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ACAACTCACA GAACATCAAC CAAAATTGCA CGTCAAGGGT ATTGGGTAAG AAACAATCAA 168 0 

ACAAATCCTC TCTGTGTGCA AAGAAACACG GTGAGTCATG CCGAGATCAT ACTCATCTGA 1740 

TATACATGCT TACAGCTCAC AAGACATTAC AAACAACTCA TATTGCATTA CAAAGATCGT 18 00 

TTCATGAAAA ATAAAATAGG CCGGACAGGA CAAAAATCCT TGACGTGTAA AGTAAATTTA 1860 

CAACAAAAAA AAAGCCATAT GTCAAGCTAA ATCTAATTCG TTTTACGTAG AT CAACAACC 192 0 

TGTAGAAGGC AACAAAACTG AGCCACGCAG AAGTACAGAA TGATTCCAGA TGAACCATCG 198 0 

ACGTGCTACG TAAAGAGAGT GACGAGTCAT ATACATTTGG CAAGAAACCA TGAAGCTGCC 2040 

TACAGCCGTC TCGGTGGCAT AAGAACACAA GAAATTGTGT TAATTAATCA AAGCTATAAA 2100 

TAACGCTCGC ATGCCTGTGC ACTTCTCCAT CACCACCACT GGGTCTTCAG ACCATTAGCT 2160 

TTATCTACTC CAGAGCGCAG AAGAAC CCGA TCGACACCAT GGCCACCAGC AAGGGCCTCA 2220 

AGGGTGTGAT GGTGTGTTTA CTTATACTGG GGTTGGTTCT CGAACAGGTG CAAGTAGAAG 22 80 

GCAAGAGTTG CTGCAAGAGT ACCCTGGGAA GGAAGTGCTA CAACCTTTGC AAAGTCAAAG 234 0 

GCGCCAAGAA GCTTTGCGCA GGCGTCTGCA AGTGTAAGCT GACTAGTAGC GGAAAATGCC 2400 

CGAAAGGCTT CCCCAAATTG GCCCTTGTGT CCAACTCAGA TGAACCAGAC ACCGTCAAGT 2460 

ATTGCAACTT GGGGTGTAGG GCTTCCATGT GTGACTACAT GGTCAACGCA GCTGCTGACG 2520 

ACGAAGAAAT GAAACTCTAT TTGGAAAATT GTGGTGATGC TTGTGTCAAT TTCTGCAACG 2 58 0 

GTGATGCTGG CCTCACATCC CTTAGTGCCT AAGTTCGACG TCGGGCCCTC TAGTCGACGG 264 0 

ATCCCCGGCG GTGTCCCCCA CTGAAGAAAC TATGTGCTGT AGTATAGCCG CTGCCCGCTG 2 70 0 

GCTAGCTAGC TAGTTGAGTC ATTTAGCGGC GATGATTGAG TAATAATGTG TCACGCATCA 2760 

CCATGCATGG GTGGCAGTGT CAGTGTGAGC AATGACCTGA ATGAACAATT GAAATGAAAA 2820 

GAAAAAAGTA TTGTTCCAAA TTAAACGTTT TAACCTTTTA ATAGGTTTAT ACAATAATTG 2 880 

ATATATGTTT TCTGTATATG TCTAATTTGT TATCATCCAT TTAGATATAG ACAAAAAAAA 2 940 

ATCTAAGAAC TAAAACAAAT GCTAATTTGA AATGAAGGGA GTATATATTG GGATAATGTC 3 000 

GATGAGATCC CTCGTAATAT CACCGACATC ACACGTGTCC AGTTAATGTA TCAGTGATAC 3060 

GTGTATTCAC ATTTGTTGCG CGTAGGCGTA CCCAACAATT TTGATCGACT ATCAGAAAGT 312 0 

CAACGGAAGC GAGTCGACCT CGAGGGGGGG CCCGGTACCC AGCTTTTGTT CCCTTTAGTG 3180 

AGGGTTAATT GCGCGCTTGG CGTAATCATG GTCATAGCTG TTTCCTGTGT GAAATTGTTA 324 0 

TCCGCTCACA ATTCCACACA ACATACGAGC CGGAAGCATA AAGTGTAAAG CCTGGGGTGC 33 00 

CTAATGAGTG AGCTAACTCA CATTAATTGC GTTGCGCTCA CTGCCCGCTT TCCAGTCGGG 33 60 

AAACCTGTCG TGCCAGCTGC ATTAATGAAT CGGCCAACGC GCGGGGAGAG GCGGTTTGCG 3420 

TATTGGGCGC TCTTCCGCTT CCTCGCTCAC TGACTCGCTG CGCTCGGTCG TTCGGCTGCG 3480 

GCGAGCGGTA TCAGCTCACT CAAAGGCGGT AATACGGTTA TCCACAGAAT CAGGGGATAA 354 0 

CGCAGGAAAG AACATGTGAG CAAAAGGCCA GCAAAAGGCC AGGAACCGTA AAAAGGCCGC 36 00 

GTTGCTGGCG TTTTTCCATA GGCTCCGCCC CCCTGACGAG CAT C AC AAAA ATCGACGCTC 3660 

AAGTCAGAGG TGGCGAAACC CGACAGGACT ATAAAGATAC CAGGCGTTTC CCCCTGGAAG 3 72 0 

CTCCCTCGTG CGCTCTCCTG TTCCGACCCT GCCGCTTACC GGATACCTGT CCGCCTTTCT 378 0 

CCCTTCGGGA AGCGTGGCGC TTTCTCATAG CTCACGCTGT AGGTATCTCA GTTCGGTGTA 384 0 

GGTCGTTCGC TCCAAGCTGG GCTGTGTGCA CGAACCCCCC GTTCAGCCCG ACCGCTGCGC 3 900 

CTTATCCGGT AACTATCGTC TTGAGTCCAA CCCGGTAAGA CACGACTTAT CGCCACTGGC 3 960 

AGCAGCCACT GGTAACAGGA TTAGCAGAGC GAGGTATGTA GGCGGTGCTA CAGAGTTCTT 4 02 0 

GAAGTGGTGG CCTAACTACG GCTACACTAG AAGGACAGTA TTTGGTATCT GCGCTCTGCT 4 08 0 

GAAGCCAGTT ACCTTCGGAA AAAGAGTTGG TAGCTCTTGA TCCGGCAAAC AAACCACCGC 4140 

TGGTAGCGGT GGTTTTTTTG TTTGCAAGCA GCAGATTACG CGCAGAAAAA AAGGATCTCA 42 00 

AGAAGATCCT TTGATCTTTT CTACGGGGTC TGACGCTCAG TGGAACGAAA ACTCACGTTA 4260 

AGGGATTTTG GTCATGAGAT TAT C AAAAAG GATCTTCACC TAGATC CTTT TAAATTAAAA 432 0 

ATGAAGTTTT AAATCAATCT AAAGTATATA TGAGTAAACT TGGTCTGACA GTTAC CAATG 4380 

CTTAATCAGT GAGGCACCTA TCTCAGCGAT CTGTCTATTT CGTTCATCCA TAGTTGCCTG 4440 

ACTCCCCGTC GTGTAGATAA CTACGATACG GGAGGGCTTA CCATCTGGCC CCAGTGCTGC 4500 

AATGATACCG CGAGACCCAC GCTCACCGGC TCCAGATTTA TCAGCAATAA ACCAGCCAGC 456 0 

CGGAAGGGCC GAGCGCAGAA GTGGTCCTGC AACTTTATCC GCCTCCATCC AGTCTATTAA 462 0 

TTGTTGCCGG GAAGCTAGAG TAAGTAGTTC GCCAGTTAAT AGTTTGCGCA ACGTTGTTGC 4680 

CATTGCTACA GGCATCGTGG TGTCACGCTC GTCGTTTGGT ATGGCTTCAT TCAGCTCCGG 4 740 

TTCCCAACGA TCAAGGCGAG TTACATGATC CCCCATGTTG TGCAAAAAAG CGGTTAGCTC 4800 

CTTCGGTCCT CCGATCGTTG TCAGAAGTAA GTTGGCCGCA GTGTTATCAC TCATGGTTAT 4860 

GGCAGCACTG CATAATTCTC TTACTGTCAT GCCATCCGTA AGATGCTTTT CTGTGACTGG 492 0 

TGAGTACTCA ACCAAGTCAT TCTGAGAATA GTGTATGCGG CGACCGAGTT GCTCTTGCCC 4 980 

GGCGTCAATA CGGGATAATA CCGCGCCACA TAGCAGAACT TTAAAAGTGC TCATCATTGG 504 0 

AAAACGTTCT TCGGGGCGAA AACTCTCAAG GATCTTACCG CTGTTGAGAT CCAGTTCGAT 5100 
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GTAACCCACT CGTGCACCCA ACTGATCTTC AGCATCTTTT ACTTTCACCA GCGTTTCTGG 5160 

GTGAGCAAAA ACAGGAAGGC AAAATGCCGC AAAAAAGGGA ATAAGGGCGA CACGGAAATG 5220 

TTGAATACTC ATACTCTTCC TTTTTCAATA TTATTGAAGC ATTTATCAGG GTTATTGTCT 52 80 

CATGAGCGGA TACATATTTG AATGTAT TTA GAAAAATAAA CAAATAGGGG TTCCGCGCAC 534 0 

ATTTCCCCGA AAAGTGG CAC 5360 



(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5511 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
{ D ) TOPOLOGY : 1 inear 



(ii) MOLECULE TYPE: Other 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

TCGCGCGTTT CGGTGATGAC GGTGAAAACC TCTGACACAT GCAGCTCCCG GAGACGGTCA 60 

CAGCTTGTCT GTAAGCGGAT GCCGGGAGCA GACAAGCCCG TCAGGGCGCG TCAGCGGGTG 12 0 

TTGGCGGGTG TCGGGGCTGG CTTAACTATG CGGCATCAGA GCAGATTGTA CTGAGAGTGC 180 

13 ACCATATGCG GTGTGAAATA CCGCACAGAT GCGTAAGGAG AAAATACCGC ATCAGGCGCC 240 

JI ATTCGCCATT CAGGCTGCGC AACTGTTGGG AAGGGCGATC GGTGCGGGCC TCTTCGCTAT 3 00 

0 TACGCCAGCT GGCGAAAGGG GGATGTGCTG CAAGGCGATT AAGTTGGGTA ACGCCAGGGT 3 60 
Pi TTTCCCAGTC ACGACGTTGT AAAACGACGG CCAGTGAATT CTTTTATGAA TAATAATAAT 42 0 
?3 GCATATCTGT GCATTACTAC CTGGGATACA AGGGCTTCTC CGCCATAACA AAT TGAGTTG 480 
Sj CGATGCTGAG AACGAACGGG GAAGAAAGTA AGCGCCGCCC AAAAAAAACG AACATGTACG 540 
%1 TCGGCTATAG CAGGTGAAAG TTCGTGCGCC AATGAAAAGG GAACGATATG CGTTGGGTAG 600 
h\ TTGGGATACT TAAATTTGGA GAGTTTGTTG CATACACTAA TCCACTAAAG TTGTCTATCT 660 

1 TTTTAACAGC TCTAGGCAGG ATATAAGATT TATATCTAAT CTGTTGGAGT TGCTTTTAGA 72 0 
^ GTAACTTTTC TCTCTGTTTC GTTTATAGCC GATTAGCACA AAATTAAACT AGGTGACGAG 78 0 
Hi AAATAAAGAA AAACGGAGGC AGTAAAAAAT ACCCAAAAAA ATACTTGGAG ATTTTTGTCT 840 
^ CAAAATTATC TTCTAATTTT AAAAGCTACA TATTAAAAAT ACTATATATT AAAAATACTT 900 
^ CGAGATCATT GCTTGGGATG GGCAGGGCCA ATAGCTAATT GCTAAGGATG GGCTATATTT 960 
% ATGTATCGTC TGAAACATGT AGGGGCTAAT AGTTAGATGA CTAATTTGCT GTGTTCGTAC 1020 
F GGGGTGCTGT TTGAGCCTAG CGATGAAGGG TCATAGTTTC ATACAAGAAC TCACTTTTGG 1080 
ffl TTCGTCTGCT GTGTCTGTTC TCAGCGTAAC GGCATCAATG GATGCCAAAC TCCGCAAGGG 114 0 

GACAAATGAA GAAGCGAAGA GATTATAGAA CACGCACGTG TCATTATTTA TTTATGGACT 12 00 

TGCCTCAGTA GCTTACAGCA TCGTACCCGC ACGTACATAC TACAGAGCCA CACTTATTGC 1260 

ACTGCCTGCC GCTTACGTAC ATAGTTAACA CGCAGAGAGG TATATACATA CACGTCCAAC 132 0 

GTCTCCACTC AGGCTCATGC TACGTACGCA CGTCGGTCGC GCGCCACCCT CTCGTTGCTT 138 0 

CCTGCTCGTT TTGGCGAGCT AGAGGGCCCG ACGTCGAACT TAGGCACTAA GGGATGTGAG 144 0 

GCCAGCATCA CCGTTGCAGA AATTGACACA AGCATC AC C A CAATTTTCCA AATAGAGTTT 1500 

CATTTCTTCG TCGTCAGCAG CTGCGTTGAC CATGTAGTCA CACATGGAAG CCCTACACCC 156 0 

CAAGTTGCAA TACTTGACGG TGTCTGGTTC ATCTGAGTTG GACACAAGGG CCAATTTGGG 1620 

GAAGCCTTTC GGGCATTTTC CGCTACTAGT CAGCTTACAC TTGCAGACGC CTGCGCAAAG 16 80 

CTTCTTGGCG CCTTTGACTT TGCAAAGGTT GTAGCACTTC CTTCCCAGGG TACTCTTGCA 174 0 

GCAACTCTTG CCTTCTACTT GCACCTGTTC GAGAACCAAC CCCAGTATAA GTAAACACAC 1800 

CATCACACCC TTGAGGCCCT TGCTGGTGGC CATGGTGTAG TGTCGACTGT GATATC CTCG 1860 

GGTGTGTGTT GGATCCTTGG GT TGGCTGT A TGCAGAACTA AAGCGGAGGT GGCGCGCATT 192 0 

TATACCAGCG CCGGGCCCTG GTACGTGGCG CGGCCGCGCG GCTACGTGGA GGAAGGCTGC 1980 

GTGGCAGCAG ACACACGGGT CGCCACGTCC CGCCGTACTC TCCTTACCGT GCTTATCCGG 2040 

GCTCCGGCTC GGTGCACGCC AGGGTGTGGC CGCCTCTGAG CAGACTTTGT CGTGTTCCAC 210 0 

AGTGGTGTCG TGTTCCGGGG ACTCCGATCC GCGGCGAGCG ACCGAGCGTG TAAAAGAGTT 2160 

CCTACTAGGT ACGTTCATTG TATCTGGACG ACGGGCAGCG GACAATTTGC TGTAAGAGAG 2220 

GGGCAGTTTT TTTTTAGAAA AACAGAGAAT TCCGTTGAGC TAATTGTAAT TCAACAAATA 22 80 

AGCTATTAGT TGGTTTTAGC TTAGATTAAA GAAGCTAACG ACTAATAGCT AATAATTAGT 2340 

TGGTCTATTA GTTGACTCAT TTTAAGGCCC TGTTTCAATC TCGCGAGATA AACTTTAGCA 24 00 
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GCTATTTTTT AGCTACTTTT AGCCATTTGT AATCTAAACA GGAGAGCTAA TGGTGGTAAT 246 0 

TGAAACTAAA CTTTAGCACT TCAATTCATA TAGCTAAAGT TTAGCAGGAA GCTAAACTTT 252 0 

ATCCCGTGAG ATTGAAACGG GGCCTAAATC TCTCAGCTAT TTTTGATGCA AATTACTGTC 2 580 

AC TACT GGAA TCGAGCGCTT TGCCGAGTGT CAAAGCCTGA AAAACACTCC GTAAAGACTT 2640 

TGCCTAGTGT GACACTCGAC AAAGAGATCT CGACGAACAG TACATCGACA ACGGCTTCTT 2700 

TGTCGAGTAC TTTTTATCGG ACACTTGACA AAGTCTTTGT CGAGTGAACT ACATTGAAAC 276 0 

TCTATGATTT TATGTGTAGG TCACTTAGGT TTCTACACAT AGTACGTCAC AACTTTACCG 2 82 0 

AAACATTATC AAATTTTTAT CACAACCTCT ATATATGATA TCATGACATG TGGACAAGTT 2880 

TCATTAATTT CTGACTTTAT TTGTGTTTTA TACAATTTTT AAACAACTAG ATAACAAGTT 294 0 

CACGGTCATG TTTAGTGAGC ATGGTGCTTG AAGATTCTGG TCTGCTTCTG AAATCGGTCG 3000 

TAACTTGTGC TAGATAACAT GCATATCATT TATTTTGCAT GCACGGTTTT CCATGTTTCG 3 060 

AGTGACTTGC AGTTTAAATG TGAATTTTCC GAAGAAATTC AAATAAACGA ACTAAATCTA 3120 

ATATTTATAG AAAACATTTT TGTAAATATG TAATTGTGCC AAAATGGTAC ATGTAGATCT 318 0 

ACATAGTGTA GGAACATACC ACAAAAAGTT TGGTTGGCAA AATAAAAAAA ATAAAATATA 324 0 

CTTTATCGAG TGTCCAAGGA TGGCACTCGG CAAGCTTGGC GTAATCATGG TCATAGCTGT 33 00 

TTCCTGTGTG AAATTGTTAT CCGCTCACAA TTCCACACAA CATACGAGCC GGAAGCATAA 3360 

AGTGTAAAGC CTGGGGTGCC TAATGAGTGA GCTAACTCAC ATTAATTGCG TTGCGCTCAC 342 0 

TGCCCGCTTT CCAGTCGGGA AACCTGTCGT GCCAGCTGCA TTAATGAATC GGCCAACGCG 34 8 0 

CGGGGAGAGG CGGTTTGCGT ATTGGGCGCT CTTCCGCTTC CTCGCTCACT GACTCGCTGC 3 540 

GCTCGGTCGT TCGGCTGCGG CGAGCGGTAT CAGCTCACTC AAAGGCGGTA ATACGGTTAT 3600 

CCACAGAATC AGGGGATAAC GCAGGAAAGA ACATGTGAGC AAAAGGCCAG CAAAAGGCCA 3660 

GGAACCGTAA AAAGGCCGCG TTGCTGGCGT TTTTCCATAG GCTCCGCCCC CCTGACGAGC 3720 

# ATCACAAAAA TCGACGCTCA AGTCAGAGGT GGCGAAACCC GACAGGACTA TAAAGATACC 3 780 

AGGCGTTTCC CCCTGGAAGC TCCCTCGTGC GCTCTCCTGT TCCGACCCTG CCGCTTACCG 3 840 

GATACCTGTC CGCCTTTCTC CCTTCGGGAA GCGTGGCGCT TTCTCAATGC TCACGCTGTA 3900 

GGTATCTCAG TTCGGTGTAG GTCGTTCGCT CCAAGCTGGG CTGTGTGCAC GAACCCCCCG 3960 

TTCAGCCCGA CCGCTGCGCC TTATCCGGTA ACTATCGTCT TGAGTCCAAC CCGGTAAGAC 4020 

ACGACTTATC GCCACTGGCA GCAGCCACTG GTAACAGGAT TAGCAGAGCG AGGTATGTAG 4 080 

GCGGTGCTAC AGAGTTCTTG AAGTGGTGGC CTAACTACGG CTACACTAGA AGGACAGTAT 414 0 

TTGGTATCTG CGCTCTGCTG AAGCCAGTTA CCTTCGGAAA AAGAGTTGGT AGCTCTTGAT 4200 

CCGGCAAACA AACCACCGCT GGTAGCGGTG GTTTTTTTGT TTGCAAGCAG CAGATTACGC 4260 

GCAGAAAAAA AGGATCTCAA GAAGATCCTT TGATCTTTTC TACGGGGTCT GACGCTCAGT 4320 

« GGAACGAAAA CTCACGTTAA GGGATTTTGG TCATGAGATT ATCAAAAAGG ATCTTCACCT 4380 

U AGATCCTTTT AAATTAAAAA TGAAGTTTTA AATCAATCTA AAGTATATAT GAGTAAACTT 444 0 

€i GGTCTGACAG TTACCAATGC TTAATCAGTG AGGCACCTAT CTCAGCGATC TGTCTATTTC 450 0 

GTTCATCCAT AGTTGCCTGA CTCCCCGTCG TGTAGATAAC TACGATACGG GAGGGC TT AC 456 0 

CS CATCTGGCCC CAGTGCTGCA ATGATACCGC GAGACCCACG CTCACCGGCT CCAGATTTAT 4620 

CAGCAATAAA CCAGCCAGCC GGAAGGGCCG AGCGCAGAAG TGGTCCTGCA ACTTTATCCG 4680 
CCTCCATCCA GTCTATTAAT TGTTGCCGGG AAGCTAGAGT AAGTAGTTCG CCAGTTAATA 4740 
GTTTGCGCAA CGTTGTTGCC ATTGCTACAG GCATCGTGGT GTCACGCTCG TCGTTTGGTA 4800 
TGGCTTCATT CAGCTCCGGT TCCCAACGAT CAAGGCGAGT TACATGATCC CCCATGTTGT 4860 
GCAAAAAAGC GGTTAGCTCC TTCGGTCCTC CGATCGTTGT CAGAAGTAAG TTGGCCGCAG 4 92 0 
TGTTATCACT CATGGTTATG GCAGCACTGC ATAATTCTCT TACTGTCATG CCATCCGTAA 498 0 
GATGCTTTTC TGTGACTGGT GAGTACTCAA CCAAGTCATT CTGAGAATAG TGTATGCGGC 5 040 
GACCGAGTTG CTCTTGCCCG GCGTCAATAC GGGATAATAC CGCGCCACAT AGCAGAACTT 5100 
TAAAAGTGCT CATCATTGGA AAACGTTCTT CGGGGCGAAA ACTCTCAAGG ATCTTACCGC 5160 
TGTTGAGATC CAGTTCGATG TAACCCACTC GTGCACCCAA CTGATCTTCA GCATCTTTTA 522 0 
CTTTCACCAG CGTTTCTGGG TGAGCAAAAA CAGGAAGGCA AAATGCCGCA AAAAAGGGAA 528 0 
TAAGGGCGAC ACGGAAATGT TGAATACTCA TACTCTTCCT TTTTCAATAT TATTGAAGCA 5340 
TTTATCAGGG TTATTGTCTC ATGAGCGGAT ACATATTTGA ATGTATTTAG AAAAATAAAC 5400 
AAATAGGGGT TCCGCGCACA TTTCCCCGAA AAGTGCCACC TGACGTCTAA GAAACCATTA 5460 
TTATCATGAC ATTAACCTAT AAAAATAGGC GTATCACGAG GCCCTTTCGT C 5511 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5115 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 
{ D ) TOPOLOGY : 1 inear 



(ii) MOLECULE TYPE: Other 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

GTTGGGAGCT CTCCCATATG GTCGACCTGC AGGCGGCCGC TCTAGAACTA GTGGATCCCC 60 

CCCTCGAGGT CGACGGTATC GATAAGCTTG ATATCTTACA AGGCCCAGCC CAGCGACCTA 120 

TTACACAGCC CGCTCGGGCC CGCGACGTCG GGACACATCT TCTTCCCCCT TTTGGTGAAG 180 

CTCTGCTCGC AGCTGTCCGG CTCCTTGGAC GTTCGTGTGG CAGATTC AT C TGTTGTCTCG 24 0 

TCTCCTGTGC TTCCTGGGTA GCT TGTGTAG TGGAGCTGAC ATGGTCTGAG CAGGCTTAAA 300 

ATTTGCTCGT AGACGAGGAG TACCAGCACA GCACGTTGCG GATTTCTCTG CCTGTGAAGT 3 60 

GCAACGTCTA GGATTGTCAC ACGCCTTGGT CGCGTCGCGT CGCGTCGCGT CGATGCGGTG 420 

GTGAGCAGAG CAGCAACAGC TGGGCGGCCC AACGTTGGCT TCCGTGTCTT CGTCGTACGT 48 0 

ACGCGCGCGC CGGGGACACG CAGCAGAGAG CGGAGAGCGA GCCGTGCACG GGGAGGTGGT 540 

GTGGAAGTGG AGCCGCGCGC CCGGCCGCCC GCGCCCGGTG GGCAACCCAA AAGT AC C C AC 600 

GACAAGCGAA GGCGCCAAAG CGATCCAAGC TCCGGAACGC AACAGCATGC GTCGCGTCGG 660 

AGAGCCAGCC ACAAGCAGCC GAGAACCGAA CCGGTGGGCG ACGCGTCATG GGACGGACGC 720 

GGGCGACGCT TCCAAACGGG CCACGTACGC CGGCGTGTGC GTGCGTGCAG ACGACAAGCC 780 

AAGGCGAGGC AGCCCCCGAT CGGGAAAGCG TTTTGGGCGC GAGCGCTGGC GTGCGGGTCA 84 0 

GTCGCTGGTG CGCAGTGCCG GGGGGAACGG GTATCGTGGG GGGCGCGGGC GGAGGAGAGC 90 0 

GTGGCGAGGG C CGAGAGC AG CGCGCGGCCG GGTCACGCAA CGCGCCCCAC GTACTGCCCT 960 

CCCCCTCCGC GCGCGCTAGA AATACCGAGG CCTGGACCGG GGGGGGGCCC CGTCACATCC 102 0 

ATCCATCGAC CGATCGATCG CCACAGCCAA CACCACCCGC CGAGGCGACG CGACAGCCGC 1080 

CAGGAGGAAG GAATAAACTC ACTGCCAGCC AGTGAAGGGG GAGAAGTGTA CTGCTCCGTC 1140 

GACCAGTGCG CGCACCGCCC GGCAGGGCTG CTCATCTCGT CGACGACCAG GTTCTGTTCC 12 00 

GATCCGATCC GATCCTGTCC TTGAGTTTCG TCCAGATCCT GGCGCGTATC TGCGTGTTTG 12 6 0 

ATGATCCAGG TTCTTCGAAC CTAAAT C TGT CCGTGCACAC GTCTTTTCTC TCTCTCCTAC 1320 

GCAGTGGATT AATCGC CATG GCCACCAGCA AGGGCCTCAA GGGTGTGATG GTGTGTTTAC 1380 

TTATACTGGG GTTGGTTCTC GAACAGGTGC AAGTAGAAGG CAAGAGTTGC TGCAAGAGTA 1440 

CCCTGGGAAG GAAGTGCTAC AACCTTTGCA AAGTCAAAGG CGC CAAGAAG CTTTGCGCAG 150 0 

GCGTCTGCAA GTGTAAGCTG ACTAGTAGCG GAAAATGCCC GAAAGGCTTC CCCAAATTGG 156 0 

CCCTTGTGTC CAACTCAGAT GAACCAGACA CCGTCAAGTA TTGCAACTTG GGGTGTAGGG 162 0 

CTTCCATGTG TGACTACATG GTCAACGCAG CTGCTGACGA CGAAGAAATG AAACTCTATT 1680 

TGGAAAATTG TGGTGATGCT TGTGTCAATT TCTGCAACGG TGATGCTGGC CTCACATCCC 174 0 

TTAGTGCCTA AGTTCGACGT CGGGCCCTCT AGATGCGGCC CGGGTGAAGA GTTCGCCCTG 1800 

CAGGGCCCCT GATCTCGCGC GTGGTGCAAA GATGTTGGGA CATCTTCTTA TATATGCTGT 186 0 

TTCGCTTATG TGATATGGAC AAGTATGTGT AGATGCTTGC TTGTGCTAGT GTAATGTAGT 192 0 

GTAGTGGTGG C C AGTGGC AC AACCTAATAA GCGCATGAAC TAATTGCTTG CGTGTGTAGT 198 0 

TAAGTAC CGA TCGGTAATTT TATATTGCGA GTAAATAAAT GGACCTGTAG TGGTGGAGTA 204 0 

AATAATCCCT GCTGTTCGGT GTTCTTATCG CTCCTCGTAT AGATATTATA TAGAGTACAT 2100 

TTTTCTCTCT CTGAATCCTA CGTGTGTGAA ATTTCTATAT CATTACTGTA AAATTTCTGC 216 0 

GTTCCAAAAG AGACCATAGC CTATCTTTGG CCCTGTTTGT TTCGGCTTCT GGCAGCTTCT 2220 

GGCCACCAAA AGCTGCTGCG GACTGCCAAA CGCTCAGATT TTCAGCTAGC TTCTATAAAA 2280 

TTAGTTGGGG CAAAAACCAT CCAAAATCAA TATAAACACA TAATCGGTTG AGTCGTTGTA 2340 

ATATTAGGAA TCTGTCACTT TCTAGATCCT GAGCCCTATG AACAACTTTA TCTTTCTCCA 2400 

TACGTAATCG TAATGATACT CAGATTCTCT CCACAGCCAG ATTCTCCTCA CAGCCAGATT 2460 

TTCAGAAAAG CTGGTCAGAA AAAAGTTAAA CCAAACAGAC CCTTTGTGTA TGCATGGATC 2 52 0 

GGCTTTCCCC GTCAAGCTCT AAATCGGGGG CTCCCTTTAG GGTTCCGATT TAGAGCTTTA 2 58 0 

CGGCACCTCG AC CGCAAAAA ACTTGATTTG GGTGATGGTT CACGTAGTGG GCCATCGCCC 2640 

TGATAGACGG TTTTTCGCCC TTTGACGTTG GAGTCCACGT TCTTTAATAG TGGACTCTTG 2 700 

TTCCAAACTG GAACAACACT CAACCCTATC TCGGTCTATT CTTTTGATTT ATAAGGGATT 2 760 

TTGCCGATTT CGGCCTATTG GTTAAAAAAT GAGCTGATTT AACAAATATT TAACGCGAAT 2 82 0 

TTTAACAAAA TATTAACGTT TACAATTTCG CCTGATGCGG TATTTTCTCC TTACGCATCT 288 0 

GTGCGGTATT TCACACCGCA TACAGGTGGC ACTTTTCGGG GAAATGTGCG CGGAACCCCT 2940 

ATTTGTTTAT TTTTCTAAAT ACATTCAAAT ATGTATCCGC TCATGAGACA ATAACCCTGA 3 00 0 

TAAATGCTTC AATAATATTG AAAAAGGAAG AGTATGAGTA TTCAACATTT CCGTGTCGCC 306 0 
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CTTATTCCCT TTTTTGCGGC ATTTTGCCTT CCTGTTTTTG CTCACCCAGA AACGGTGGTG 312 0 

AAAGTAAAAG ATGCTGAAGA TCAGTTGGGT GCACGAGTGG GTTACATCGA ACTGGATCTC 3180 

AACAGCGGTA AGATCCTTGA GAGTTTTCGC CCCGAAGAAC GTTTTCCAAT GATGAGCACT 3240 

TTTAAAGTTC TGCTATGTCA TACACTATTA TCCCGTATTG ACGCCGGGCA AGAGCAACTC 3300 

GGTCGCCGGG CGCGGTATTC TCAGAATGAC TTGGTTGAGT ACTCACCAGT CACAGAAAAG 3 360 

CATCTTACGG ATGGCATGAC AGTAAGAGAA TTATGCAGTG CTGCCATAAC CATGAGTGAT 3420 

AACACTGCGG CCAACTTACT TCTGACAACG ATCGGAGGAC CGAAGGAGCT AACCGCTTTT 3480 

TTGCACAACA TGGGGGATCA TGTAACTCGC CTTGATCGTT GGGAACCGGA GCTGAATGAA 3 540 

GCCATACCAA ACGACGAGCG TGACAC CACG ATGCCTGTAG CAATGCCAAC AACGTTGCGC 3600 

AAACTATTAA CTGGCGAACT ACTTACTCTA GCTTCCCGGC AACAATTAAT AGACTGGATG 3660 

GAGGCGGATA AAGTTGCAGG ACCACTTCTG CGCTCGGCCC TTCCGGCTGG CTGGTTTATT 3 72 0 

GCTGATAAAT CTGGAGCCGG TGAGCGTGGG TCTCGCGGTA TCATTGCAGC ACTGGGGCCA 3780 

GATGGTAAGC CCTCCCGTAT CGTAGTTATC TACACGACGG GGAGTCAGGC AACTATGGAT 3840 

GAACGAAATA GACAGATCGC TGAGATAGGT GCCTCACTGA TTAAGCATTG GTAACTGTCA 3900 

GACCAAGTTT ACTCATATAT ACTTTAGATT GATTTAAAAC TTCATTTTTA ATTTAAAAGG 3 960 

ATCTAGGTGA AGATCCTTTT TGATAATCTC ATGACCAAAA TCCCTTAACG TGAGTTTTCG 4 02 0 

TTCCACTGAG CGTCAGACCC CGTAGAAAAG ATCAAAGGAT CTTCTTGAGA TCCTTTTTTT 4080 

CTGCGCGTAA TCTGCTGCTT GCAAACAAAA AAACCACCGC TACCAGCGGT GGTTTGTTTG 4140 

CCGGATCAAG AGCTACCAAC TCTTTTTCCG AAGGTAACTG GCTTCAGCAG AGCGCAGATA 42 00 

CCAAATACTG TCCTTCTAGT GTAGCCGTAG TTAGGC C AC C ACTTCAAGAA CTCTGTAGCA 4260 

CCGCCTACAT ACCTCGCTCT GCTAATCCTG TTACCAGTGG CTGCTGCCAG TGGCGATAAG 432 0 

5 TCGTGTCTTA CCGGGTTGGA CTCAAGACGA TAGTTACCGG ATAAGGCGCA GCGGTCGGGC 4380 

6 TGAACGGGGG GTTCGTGCAC ACAGCCCAGC TTGGAGCGAA CGACCTACAC CGAACTGAGA 4440 
S TACCTACAGC GTGAGCTATG AGAAAGCGCC ACGCTTCCCG AAGGGAGAAA GGCGGACAGG 4500 
ft] TATCCGGTAA GCGGCAGGGT CGGAACAGGA GAGCGCACGA GGGAGCTTCC AGGGGGAAAC 456 0 
g GCCTGGTATC TTTATAGTCC TGTCGGGTTT CGCCACCTCT GACTTGAGCG TCGATTTTTG 462 0 
S TGATGCTCGT CAGGGGGGCG GAGCCTATCG AAAAACGCCA GCAACGCGGC CTTTTTACGG 4680 
U TTCCTGGCCT TTTGCTGGCC TTTTGCTCAC ATGTTCTTTC CTGCGTTATC CCCTGATTCT 4 740 

GTGGATAACC GTATTAC CGC CTTTGAGTGA GCTGATAC CG CTCGCCGCAG CCGAACGACC 4800 

T GAGCGCAGCG AGTCAGTGAG CGAGGAAGCG GAAGAGCGCC CAATACGCAA ACCGCCTCTC 4860 

CCCGCGCGTT GGCCGATTCA TTAATGCAGC TGGCACGACA GGTTTCCCGA CTGGAAAGCG 492 0 

^ GGCAGTGAGC GCAACGCAAT TAATGTGAGT TAG CT C AC T C ATTAGGCACC CCAGGCTTTA 498 0 

!i CACTTTATGC TTCCGGCTCG TATGTTGTGT GGAATTGTGA GCGGATAACA ATTTCACACA 5040 

GGAAACAGCT ATGACCATGA TTACGCCAAG CTATTTAGGT GACACTATAG AATACTCAAG 5100 

CTATGCATCC AACGC 5115 



(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5392 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

CTAAATTGTA AGCGTTAATA TTTTGTTAAA ATTCGCGTTA AATTTTTGTT AAATCAGCTC 60 

ATTTTTTAAC CAATAGGCCG AAATCGGCAA AATCCCTTAT AAATCAAAAG AATAGACCGA 120 

GATAGGGTTG AGTGTTGTTC CAGTTTGGAA CAAGAGTCCA CTATTAAAGA ACGTGGACTC 180 

CAACGTCAAA GGGCGAAAAA CCGTCTATCA GGGCGATGGC CCACTACGTG AACCATCACC 24 0 

CTAATCAAGT TTTTTGGGGT CGAGGTGCCG TAAAGCACTA AATCGGAACC CTAAAGGGAG 30 0 

CCCCCGATTT AGAGCTTGAC GGGGAAAGCC GGCGAACGTG GCGAGAAAGG AAGGGAAGAA 360 

AGCGAAAGGA GCGGGCGCTA GGGCGCTGGC AAGTGTAGCG GTCACGCTGC GCGTAACCAC 420 

CACACCCGCC GCGCTTAATG CGCCGCTACA GGGCGCGTCC CATTCGCCAT TCAGGCTGCG 480 

CAACTGTTGG GAAGGGCGAT CGGTGCGGGC CTCTTCGCTA TTACGCCAGC TGGCGAAAGG 54 0 

GGGATGTGCT GCAAGGCGAT TAAGTTGGGT AACGCCAGGG TTTTCCCAGT CACGACGTTG 600 
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TAAAACGACG GCCAGTGAGC GCGCGTAATA CGACTCACTA TAGGGCGAAT TGGAGCTCCA 66 0 

CCGCGGTGGC GGCCGCTCTA GATTATATAA TTTATAAGCT AAACAACCCG GCCCTAAAGC 720 

ACTATCGTAT CACCTATCTA AATAAGTCAC GGGAGTTTCG AACGTCCACT TCGTCGCACG 780 

GAATTGCATG TTTCTTGTTG GAAGCATATT CACGCAATCT CCACACATAA AGGTTTATGT 840 

ATAAACTTAC ATTTAGCTCA GTTTAATTAC AGTCTTATTT GGATGCATAT GTATGGTTCT 900 

CAATCCATAT AAGTTAGAGT AAAAAATAAG TTTAAATTTT ATCTTAATTC ACTCCAACAT 96 0 

ATATGGATCT ACAATACTCA TGTGCATCCA AACAAACTAC TTATATTGAG GTGAATTTGG 1020 

TAGAAATTAA AC TAACTTAC AC AC TAAGC C AATCTTTACT ATATTAAAGC ACCAGTTTCA 108 0 

ACGATCGTCC CGCGTCAATA TTATTAAAAA ACTCCTACAT TTCTTTATAA TCAACCCGCA 1140 

CTCTTATAAT CTCTTCTCTA CTACTATAAT AAGAGAGTTT ATGTACAAAA TAAGGTGAAA 12 00 

TTATCTATAA GTGTTCTGGA TATTGGTTGT TGGCTCCCAT ATTCACACAA CCTAATCAAT 1260 

AGAAAACATA TGTTTTATTA AAACAAAATT TAT CAT AT AT CATATATATA TATATATCAT 1320 

ATATATATAT AAAC CGTAGC AATGCACGGG CATATAACTA GTGCAACTTA ATACATGTGT 13 80 

GTATTAAGAT GAATAAGAGG GTATCCAAAT AAAAAACTTG TTGCTTACGT ATGGATCGAA 1440 

AGGGGTTGGA AACGATTAAA CGATTAAATC TCTTCCTAGT CAAAATTGAA TAGAAGGAGA 1500 

TTTAATATAT CCCAATCCCC TTCGATCATC CAGGTGCAAC CGTATAAGTC CTAAAGTGGT 1560 

GAGGAACACG AAAGAACCAT GCATTGGCAT GTAAAGCTCC AAGAATTTGT TGTATCCTTA 162 0 

ACAACTCACA GAACATCAAC CAAAATTGCA CGTCAAGGGT ATTGGGTAAG AAACAATCAA 1680 

ACAAATCCTC TCTGTGTGCA AAGAAACACG GTGAGTCATG CCGAGATCAT ACTCATCTGA 1740 

TATACATGCT TACAGCTCAC AAGACATTAC AAACAACT C A TATTGCATTA CAAAGATCGT 1800 

TTCATGAAAA ATAAAATAGG CCGGACAGGA CAAAAATCCT TGACGTGTAA AGTAAATTTA 1860 

CAACAAAAAA AAAGC CAT AT GTCAAGCTAA ATCTAATTCG TTTTACGTAG ATCAACAACC 1920 

TGTAGAAGGC AACAAAACTG AGC CACGC AG AAGTACAGAA TGATTCCAGA TGAACCATCG 1980 

ACGTGCTACG TAAAGAGAGT GACGAGTCAT ATACATTTGG CAAGAAACCA TGAAGCTGCC 2 040 

TACAGCCGTA TCGGTGGCAT AAGAACACAA GAAATTGTGT TAATTAATCA AAGCTATAAA 2100 

TAACGCTCGC ATGCCTGTGC ACTTCTCCAT CACCACCACT GGGTCTTCAG ACCATTAGCT 2160 

TTATCTACTC CAGAGCGCAG AAGAACCCGA TCGACACCAT GACCAAGTTC ACAATCCTCC 222 0 

TCATCTCTCT TCTCTTCTGC ATCGCCCACA CTTGCAGCGC CTCCAAATGG CAGCACCAGC 22 80 

AAGATAGCTG CCGCAAGCAG CTTAAGGGGG TGAACCTCAC GCCCTGCGAG AAGCACATCA 2340 

TGGAGAAGAT CCAAGGCCGC GGC GATGACG ATGATGATGA TGACGACGAC AATCACATTC 2400 

TCAGGACCAT GCGGGGGAAG AATCACTACA TACGGAAGAA GGAAGGAAAA GACGAAGACG 2460 

AAGAAGAAGA AGGACACATG CAGAAGTGCT GCGCTTTGCA CTGGCATTTG GGGCTCTTAA 2520 

GCTCGCTCAT TTCTGTGCTG CAGAAGATAA TGGAGAACCA GAGCGAGGAA CTGGAGGAGA 2 580 

AGGAGAAGAA GAAAATGGAG AAGGAGCTTA TGAACTTGGC TACTATGTGC AGGTTTGGGC 2640 

CCATGATCGG GTGCGACTTG TCCTCCGATG ACTAAGTTGA TCCCCGGCGG TGTCCCCCAC 2 700 

TGAAGAAACT ATGTGCTGTA GTATAGCCGC TGGCTAGCTA GCTAGTTGAG TCATTTAGCG 2760 

GCGATGATTG AGTAATAATG TGTCACGCAT CACCATGCAT GGGTGGCAGT CTCAGTGTGA 282 0 

GCAATGACCT GAATGAACAA TTGAAATGAA AAGAAAAAAG TATTGTTCCA AATTAAACGT 288 0 

TTTAACCTTT TAATAGGTTT ATACAATAAT TGATATATGT TTTCTGTATA TGTCTAATTT 2940 

GTTATCATCC ATTTAGATAT AGACGAAAAA AAATCTAAGA ACTAAAACAA ATGCTAATTT 30 00 

GAAATGAAGG GAGTATATAT TGGGATAATG TCGATGAGAT CCCTCGTAAT ATCACCGACA 3 060 

TCACACGTGT CCAGTTAATG TATCAGTGAT ACGTGTATTC ACATTTGTTG CGCGTAGGCG 312 0 

TACCCAACAA TTTTGATCGA CTATCAGAAA GTCAACGGAA GCGAGTCGAC CTCGAGGGGG 3180 

GGCCCGGTAC CCAGCTTTTG TTCCCTTTAG TGAGGGTTAA TTGCGCGCTT GGCGTAAT C A 3240 

TGGTCATAGC TGTTTCCTGT GTGAAATTGT TATCCGCTCA CAATTCCACA CAACATACGA 3 3 00 

GCCGGAAGCA TAAAGTGTAA AGCCTGGGGT GCCTAATGAG TGAGCTAACT CACATTAATT 336 0 

GCGTTGCGCT CACTGCCCGC TTTCCAGTCG GGAAACCTGT CGTGCCAGCT GCATTAATGA 3420 

ATCGGCCAAC GCGCGGGGAG AGGCGGTTTG CGTATTGGGC GCTCTTCCGC TTCCTCGCTC 3480 

ACTGACTCGC TGCGCTCGGT CGTTCGGCTG CGGCGAGCGG TATCAGCTCA CTCAAAGGCG 354 0 

GTAATACGGT TATCCACAGA ATCAGGGGAT AACGCAGGAA AGAACATGTG AGCAAAAGGC 3600 

CAGCAAAAGG CCAGGAACCG TAAAAAGGCC GCGTTGCTGG CGTTTTTCCA TAGGCTCCGC 366 0 

CCCCCTGACG AGCATCACAA AAATCGACGC TCAAGTCAGA GGTGGCGAAA CCCGACAGGA 372 0 

CTATAAAGAT ACCAGGCGTT TCCCCCTGGA AGCTCCCTCG TGCGCTCTCC TGTTCCGACC 3780 

CTGCCGCTTA CCGGATACCT GTCCGCCTTT CTCCCTTCGG GAAGCGTGGC GCTTTCTCAT 384 0 

AGCTCACGCT GTAGGTATCT CAGTTCGGTG TAGGTCGTTC GCTCCAAGCT GGGCTGTGTG 3 900 

CACGAACCCC CCGTTCAGCC CGACCGCTGC GCCTTATCCG GTAACTATCG TCTTGAGTCC 3 96 0 

AACCCGGTAA GACACGACTT ATCGCCACTG GCAGCAGCCA CTGGTAACAG G ATT AG C AGA 402 0 

GCGAGGTATG TAGGCGGTGC TACAGAGTTC TTGAAGTGGT GGCCTAACTA CGGCTACACT 4080 
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AGAAGGACAG TATTTGGTAT CTGCGCTCTG CTGAAGC C AG TTACCTTCGG AAAAAGAGTT 414 0 

GGTAGCTCTT GATCCGGCAA ACAAACCACC GCTGGTAGCG GTGGTTTTTT TGTTTGCAAG 42 00 

CAGCAGATTA CGCGCAGAAA AAAAGGATCT CAAGAAGATC CTTTGATCTT TTCTACGGGG 4260 

TCTGACGCTC AGTGGAACGA AAACTCACGT TAAGGGATTT TGGTCATGAG ATTATCAAAA 4320 

AGGATCTTCA CCTAGATCCT TTTAAATTAA AAATGAAGTT TTAAATCAAT CTAAAGTATA 4380 

TATGAGTAAA CTTGGTCTGA CAGTTACCAA TGCTTAATCA GTGAGGCACC TATCTCAGCG 444 0 

ATCTGTCTAT TTCGTTCATC CATAGTTGCC TGACTCCCCG TCGTGTAGAT AACTACGATA 45 0 0 

CGGGAGGGCT TACCATCTGG CCCCAGTGCT GCAATGATAC CGCGAGACCC ACGCTCACCG 4560 

GCTCCAGATT TATCAGCAAT AAACCAGCCA GCCGGAAGGG C CGAGCGCAG AAGTGGTCCT 4620 

GCAACTTTAT CCGCCTCCAT C CAGTCTATT AATTGTTGCC GGGAAGCTAG AGTAAGTAGT 4680 

TCGCCAGTTA ATAGTTTGCG CAACGTTGTT GCCATTGCTA CAGGCATCGT GGTGTCACGC 474 0 

TCGTCGTTTG GTATGGCTTC ATTCAGCTCC GGTTCCCAAC GATCAAGGCG AGTTACATGA 48 0 0 

TCCCCCATGT TGTGCAAAAA AGCGGTTAGC TCCTTCGGTC CTCCGATCGT TGTCAGAAGT 4860 

AAGTTGGCCG CAGTGTTATC ACTCATGGTT ATGGCAGCAC TGCATAATTC TCTTACTGTC 4920 

ATGCCATCCG TAAGATGCTT TTCTGTGACT GGTGAGTACT CAACCAAGTC ATTCTGAGAA 498 0 

TAGTGTATGC GGCGACCGAG TTGCTCTTGC CCGGCGTCAA TACGGGATAA TACCGCGCCA 504 0 

CATAGCAGAA CTTTAAAAGT GCTCATCATT GGAAAACGTT CTTCGGGGCG AAAACTCTCA 5100 

AGGATCTTAC CGCTGTTGAG ATCCAGTTCG ATGTAACCCA CTCGTGCACC CAACTGATCT 5160 

TCAGCATCTT TTACTTTCAC CAGCGTTTCT GGGTGAGCAA AAACAGGAAG GCAAAATGCC 5220 

GCAAAAAAGG GAATAAGGGC GACACGGAAA TGTTGAATAC TCATACTCTT CCTTTTTCAA 528 0 

TATTATTGAA GCATTTATCA GGGTTATTGT CTCATGAGCG GATACATATT TGAATGTATT 5340 

Q TAGAAAAATA AACAAATAGG GGTTCCGCGC ACATTTCCCC GAAAAGTGCC AC 5392 

Q (2) INFORMATION FOR SEQ ID NO : 7 : 

p (i) SEQUENCE CHARACTERISTICS: 

^ (A) LENGTH: 5173 base pairs 

y (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

~ (D) TOPOLOGY: linear 

s 

jjj (ii) MOLECULE TYPE: Other 

? (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

# 

^ CTAAATTGTA AGCGTTAATA TTTTGTTAAA ATTCGCGTTA AATTTTTGTT AAATCAGCTC 60 

CO ATTTTTTAAC CAATAGGCCG AAATCGGCAA AATCCCTTAT AAATCAAAAG AATAGACCGA 120 

GATAGGGTTG AGTGTTGTTC CAGTTTGGAA CAAGAGTCCA CTATTAAAGA ACGTGGACTC 18 0 

CAACGTCAAA GGGCGAAAAA CCGTCTATCA GGGCGATGGC CCACTACGTG AACCATCACC 24 0 

CTAATCAAGT TTTTTGGGGT CGAGGTGCCG TAAAGCACTA AATCGGAACC CTAAAGGGAG 3 00 

CCCCCGATTT AGAGCTTGAC GGGGAAAGCC GGCGAACGTG GCGAGAAAGG AAGGGAAGAA 360 

AGCGAAAGGA GCGGGCGCTA GGGCGCTGGC AAGTGTAGCG GTCACGCTGC GCGTAACCAC 42 0 

CACACCCGCC GCGCTTAATG CGCCGCTACA GGGCGCGTCC CATTCGCCAT TCAGGCTGCG 48 0 

CAACTGTTGG GAAGGGCGAT CGGTGCGGGC CTCTTCGCTA TTACGCCAGC TGGCGAAAGG 540 

GGGATGTGCT GCAAGGCGAT TAAGTTGGGT AACGC C AGGG TTTTCCCAGT CACGACGTTG 600 

TAAAACGACG GCCAGTGAGC GCGCGTAATA CGACTCACTA TAGGGCGAAT TGGAGCTCCA 66 0 

CCGCGGTGGC GGCCGCTCTA GATTATATAA TTTATAAGCT AAACAACCCG GCCCTAAAGC 72 0 

ACTATCGTAT CACCTATCTA AATAAGTCAC GGGAGTTTCG AACGTCCACT TCGTCGCACG 78 0 

GAATTGCATG TTTCTTGTTG GAAGCATATT CACGCAATCT CCACACATAA AGGTTTATGT 840 

ATAAACTTAC ATTTAGCTCA GTTTAATTAC AGTCTTATTT GGATGCATAT GTATGGTTCT 900 

CAATCCATAT AAGTTAGAGT AAAAAATAAG TTTAAATTTT ATCTTAATTC ACTCCAACAT 960 

ATATGGATCT ACAATACTCA TGTGCATCCA AACAAACTAC TTATATTGAG GTGAATTTGG 1020 

TAGAAATTAA ACTAACTTAC ACACTAAGCC AATCTTTACT ATATTAAAGC ACCAGTTTCA 108 0 

ACGATCGTCC CGCGTCAATA TTATTAAAAA ACTCCTACAT TTCTTTATAA TCAACCCGCA 114 0 

CTCTTATAAT CTCTTCTCTA CTACTATAAT AAGAGAGTTT ATGTACAAAA TAAGGTGAAA 12 00 

TTATCTATAA GTGTTCTGGA TATTGGTTGT TGGCTCCCAT ATTCACACAA CCTAATCAAT 126 0 

AGAAAACATA TGTTTTATTA AAACAAAATT TATCATATAT CATATATATA TATATATCAT 132 0 

ATATATATAT AAACCGTAGC AATGCACGGG CATATAACTA GTGCAACTTA ATACATGTGT 1380 
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GTATTAAGAT GAATAAGAGG GTATCCAAAT AAAAAACTTG TTGCTTACGT ATGGATCGAA 1440 

AGGGGTTGGA AACGATTAAA CGATTAAATC TCTTCCTAGT CAAAATTGAA TAGAAGGAGA 1500 

TTTAATATAT CCCAATCCCC TTCGATCATC CAGGTGCAAC CGTATAAGTC CTAAAGTGGT 1560 

GAGGAACACG AAAGAACCAT GCATTGGCAT GTAAAGCTCC AAGAATTTGT TGTATCCTTA 1620 

ACAACTCACA GAACATCAAC CAAAATTGCA CGTCAAGGGT ATTGGGTAAG AAACAATCAA 1680 

ACAAATCCTC TCTGTGTGCA AAGAAACACG GTGAGTCATG CCGAGATCAT ACTCATCTGA 1740 

TATACATGCT TACAGCTCAC AAGACATTAC AAACAACTCA TATTGCATTA CAAAGATCGT 1800 

TTCATGAAAA ATAAAATAGG CCGGACAGGA CAAAAATCCT TGACGTGTAA AGTAAATTTA 1860 

CAACAAAAAA AAAGCCATAT GTCAAGCTAA ATCTAATTCG TTTTACGTAG ATCAACAACC 192 0 

TGTAGAAGGC AACAAAACTG AGCCACGCAG AAGTACAGAA TGATTCCAGA TGAACCATCG 1980 

ACGTGCTACG TAAAGAGAGT GACGAGTCAT ATACATTTGG CAAGAAACCA TGAAGCTGCC 2040 

TACAGCCGTA TCGGTGGCAT AAGAACACAA GAAATTGTGT TAATTAATCA AAGCTATAAA 2100 

TAACGCTCGC ATGCCTGTGC ACTTCTCCAT CACCACCACT GGGTCTTCAG ACCATTAGCT 216 0 

TTATCTACTC CAGAGCGCAG AAGAACCCGA TCGACACCAT GAAGTCGGTG GAGAAGAAAC 222 0 

CGAAGGGTGT GAAGACAGGT GCGGGTGACA AGCATAAGCT GAAGACAGAG TGGCCGGAGT 2280 

TGGTGGGGAA ATCGGTGGAG AAAGCCAAGA AGGTGATCCT GAAGGACAAG CCAGAGGCGC 2340 

AAATCATAGT TCTACCGGTT GGTACAAAGG TGGGTAAGCA TTATAAGATC GACAAGGTCA 2400 

AGCTTTTTGT GGATAAAAAG GACAACATCG CGCAGGTCCC CAGGGTCGGC TAGCCTCGAG 2460 

ATCCCCGGCG GTGTCCCCCA CTGAAGAAAC TATGTGCTGT AGTATAGCCG CTGGCTAGCT 2520 

AGCTAGTTGA GTCATTTAGC GGCGATGATT GAGTAATAAT GTGTCACGCA TCACCATGCA 2 58 0 

TGGGTGGCAG TCTCAGTGTG AGCAATGACC TGAATGAACA ATTGAAATGA AAAGAAAAAA 2640 

Q GTATTGTTCC AAATTAAACG TTTTAACCTT TTAATAGGTT TATACAATAA TTGATATATG 2700 

# TTTTCTGTAT ATGTCTAATT TGTTATCATC CATTTAGATA TAGACGAAAA AAAATCTAAG 2760 

Q AACTAAAACA AATGCTAATT TGAAATGAAG GGAGTATATA TTGGGATAAT GTCGATGAGA 2 820 

ffl TCCCTCGTAA TATCACCGAC ATCACACGTG TCCAGTTAAT GTATCAGTGA TACGTGTATT 2880 

CACATTTGTT GCGCGTAGGC GTACCCAACA ATTTTGATCG ACTATCAGAA AGTCAACGGA 2940 

AGCGAGTCGA CCTCGAGGGG GGGCCCGGTA CCCAGCTTTT GTTCCCTTTA GTGAGGGTTA 3000 

ATTGCGCGCT TGGCGTAATC ATGGTCATAG CTGTTTCCTG TGTGAAATTG TTATCCGCTC 3060 

m ACAATTCCAC ACAACATACG AGCCGGAAGC ATAAAGTGTA AAGC CTGGGG TGCCTAATGA 3120 

~ GTGAGCTAAC TCACATTAAT TGCGTTGCGC TCACTGCCCG CTTTCCAGTC GGGAAACCTG 3180 

« TCGTGC CAGC TGCATTAATG AATCGGCCAA CGCGCGGGGA GAGGCGGTTT GCGTATTGGG 3240 

^ CGCTCTTCCG CTTCCTCGCT CACTGACTCG CTGCGCTCGG TCGTTCGGCT GCGGCGAGCG 3300 

~ GTATCAGCTC ACTCAAAGGC GGTAATACGG TTATCCACAG AATCAGGGGA TAACGCAGGA 3360 

° AAGAACATGT GAGCAAAAGG CCAGCAAAAG GCCAGGAACC GTAAAAAGGC CGCGTTGCTG 3420 

GCGTTTTTCC ATAGGCTCCG CCCCCCTGAC GAGCATCACA AAAATCGACG CTCAAGTCAG 3480 

*fi AGGTGGCGAA ACCCGACAGG ACTATAAAGA TACCAGGCGT TTCCCCCTGG AAGCTCCCTC 354 0 

S3 GTGCGCTCTC CTGTTCCGAC CCTGCCGCTT ACCGGATACC TGTCCGCCTT TCTCCCTTCG 3600 

GGAAGCGTGG CGCTTTCTCA TAGCTCACGC TGTAGGTATC TCAGTTCGGT GTAGGTCGTT 3660 

CGCTCCAAGC TGGGCTGTGT GCACGAACCC CCCGTTCAGC CCGACCGCTG CGCCTTATCC 3720 

GGTAACTATC GTCTTGAGTC CAAGCCGGTA AGACACGACT TATCGC CACT GGCAGCAGCC 3 780 

ACTGGTAACA GGATTAGCAG AGCGAGGTAT GTAGGCGGTG CTACAGAGTT CTTGAAGTGG 3840 

TGGCCTAACT ACGGCTACAC TAGAAGGACA GTATTTGGTA TCTGCGCTCT GCTGAAGCCA 3900 

GTTACCTTCG GAAAAAGAGT TGGTAGCTCT TGATCCGGCA AACAAACCAC CGCTGGTAGC 3 960 

GGTGGTTTTT TTGTTTGCAA GCAGCAGATT ACGCGCAGAA AAAAAGGATC TCAAGAAGAT 4020 

CCTTTGATCT TTTCTACGGG GTCTGACGCT CAGTGGAACG AAAACTCACG TTAAGGGATT 4080 

TTGGTCATGA GATTATCAAA AAGGATCTTC ACCTAGATCC TTTTAAATTA AAAATGAAGT 4140 

TTTAAATCAA TCTAAAGTAT ATATGAGTAA ACTTGGTCTG ACAGTTACCA ATGCTTAATC 4200 

AGTGAGGCAC CTATCTCAGC GATCTGTCTA TTTCGTTCAT CCATAGTTGC CTGACTCCCC 4260 

GTCGTGTAGA TAACTACGAT ACGGGAGGGC TTACCATCTG GCCCCAGTGC TGCAATGATA 4320 

CCGCGAGACC CACGCTCACC GGCTCCAGAT TTATCAGCAA TAAACCAGCC AGCCGGAAGG 4380 

GCCGAGCGCA GAAGTGGTCC TGCAACTTTA TCCGCCTCCA TCCAGTCTAT TAATTGTTGC 4440 

CGGGAAGCTA GAGTAAGTAG TTCGCCAGTT AATAGTTTGC GCAACGTTGT TGCCATTGCT 4500 

ACAGGCATCG TGGTGTCACG CTCGTCGTTT GGTATGGCTT CATTCAGCTC CGGTTCCCAA 4560 

CGATCAAGGC GAGTTACATG ATCCCCCATG TTGTGCAAAA AAGCGGTTAG CTCCTTCGGT 4620 

CCTCCGATCG TTGTCAGAAG TAAGTTGGCC GCAGTGTTAT CACTCATGGT TATGGCAGCA 4680 

CTGCATAATT CTCTTACTGT CATGCCATCC GTAAGATGCT TTTCTGTGAC TGGTGAGTAC 4740 

TCAACCAAGT CATTCTGAGA ATAGTGTATG CGGCGACCGA GTTGCTCTTG CCCGGCGTCA 4800 

ATACGGGATA ATACCGCGCC ACATAGCAGA ACTTTAAAAG TGCTCATCAT TGGAAAACGT 4860 
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TCTTCGGGGC GAAAACTCTC AAGGATCTTA CCGCTGTTGA GATCCAGTTC GATGTAACCC 4 920 

ACTCGTGCAC CCAACTGATC TTCAGCATCT TTTACTTTCA CCAGCGTTTC TGGGTGAGCA 4980 

AAAACAGGAA GGCAAAATGC CGCAAAAAAG GGAATAAGGG CGACACGGAA ATGTTGAATA 5040 

CTCATACTCT TCCTTTTTCA ATATTATTGA AGCATTTATC AGGGTTATTG TCTCATGAGC 5100 

GGATACATAT TTGAATGTAT TTAGAAAAAT AAACAAATAG GGGTTCCGCG CACATTTCCC 5160 

CGAAAAGTGC CAC 5173 

(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 54 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
AGTATAAGTA AACACACCAT CACACCCTTG AGGCCCTTGC TGGTGGCCAT GGTG 54 
Q (2) INFORMATION FOR SEQ ID NO : 9 : 

'-=£=!■ 

O. (i) SEQUENCE CHARACTERISTICS: 

p;j (A) LENGTH: 55 base pairs 

Q (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

m 

(ii) MOLECULE TYPE: Other 
Sj (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

5 CCTCACATCC CTTAGTGCCT AAGTTCGACG TCGGGCCCTC TAGTCGACGG ATCCA 55 

p (2) INFORMATION FOR SEQ ID NO: 10: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:10: 
AGCGGAAAAT GCCCGAAAGG CTTCCCCAAA TTGGC 35 



(2) INFORMATION FOR SEQ ID NO : 11 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: Other 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
TGCGCAGGCG TCTGCAAGTG TAAGCTGACT AGTAGCGGAA AATGC 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 12 : 
TACAACCTTT GCAAAGTCAA AGGCGCCAAG AAGCTTTGCG CAGGCGTCTG 
(2) INFORMATION FOR SEQ ID NO:13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 
GCAAGAGTTG CTGCAAGAGT ACCCTGGGAA GGAAGTGCTA CAACCTTTGC 
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The invention is not limited to the exact details shown and described, for it 
should be understood that many variations and modifications may be made while 
remaining within the spirit and scope of the invention defined by the claims. 
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WHAT IS CLAIMED IS 

A transformed cereal plant seed, the endosperm of which is characterized as 
having an elevated level of at least one preselected amino acid compared to 
a seed from a corresponding plant which has not been transformed, wherein 
the amino acid is lysine, cysteine, threonine, tryptophan, arginine, valine, 
leucine, isoleucine, histidine or combinations thereof and optionally 
methionine. 

The seed according to claim 1 wherein the preselected amino acid is lysine, 
threonine or tryptophan and optionally a sulfur-containing amino acid. 
The seed according to Claim 2 wherein the preselected amino acid is lysine. 
The seed according to Claim 3 wherein the preselected amino acid is lysine 
and a sulfur-containing amino acid. 

The seed according to Claim 1 wherein the plant is selected from the group 
consisting of maize, wheat, rice, barley, oats, sorghum, millet and rye. 
The seed according to Claim 5 which is a maize seed. 

The seed according to Claim 1 wherein the plant expresses a transgenic 

protein having an elevated level of the preselected amino acid. 

The seed according to Claim 7 wherein the protein is barley chymotrypsin 

inhibitor, barley alpha hordothionin, soybean 2S albumin protein, rice high 

methionine protein, sunflower high methionine protein or derivatives of each 

protein. 

The seed according to Claim 1 wherein the amount of preselected amino acid 
in the seed is increased at least about 10 percent by weight compared to a 
corresponding seed which has not been transformed. 
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10. The seed according to Claim 9 wherein the amount of the preselected amino 
acid in the seed is about 10 percent by weight to about 10 times greater 
compared to a corresponding seed which has not been transformed. 

1 1 . The seed according to Claim 1 0 wherein the amount of the preselected amino 
5 acid in the seed is about 15 percent by weight to about 10 times greater 

compared to a corresponding seed which has not been transformed. 

12. The seed according to Claim 1 1 wherein the amount of the preselected amino 
acid in the seed is about 20 percent by weight to about 10 times greater 
compared to a corresponding seed which has not been transformed. 

io 13. An expression cassette comprising a seed endosperm-preferred promoter 
operably linked to a structural gene encoding a polypeptide elevated in 
content of a preselected amino acid. 
14. The cassette according to Claim 13 wherein the promoter is a gamma zein 
promoter or a waxy promoter. 

15 15. A vector comprising the expression cassette of Claim 1 3. 

16. A plant cell transformed with the vector of Claim 1 5. 

17. A transformed plant comprising the vector of Claim 1 5. 

18. A seed product obtainable from the transformed seed of Claim 1 . 

19. A seed from a cereal plant which has been transformed to express a 
20 heterologous protein in the endosperm of the seed, wherein the seed exhibits 

an elevated level of an essential amino acid compared to a plant which has 
not been transformed. 

20. A method for increasing the nutritional value of a cereal plant seed 
comprising: transforming a host plant cell with a vector comprising an 
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expression cassette comprising a seed endosperm-preferred promoter 
operably linked to a structural gene encoding a polypeptide elevated in 
content of a preselected amino acid; recovering the transformed cells; 
regenerating a transformed plant; and recovering the seeds therefrom. 
A seed produced by the method of claim 20. 
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ABSTRACT OF THE DISCLOSURE 

Rudolf Jung 
Larry R. Beach 
Virginia M. Dress 
A. Gururaj Rao 
Jerome P. Ranch 

David S. Ertl 
Regina K. Higgins 

The present invention provides a plant seed the endosperm of which is 
characterized as having an elevated level of a preselected amino acid. The present 
invention also provides expression cassettes, vectors, plants, plant cells and a 
method for enhancing the nutritional value of seeds. 
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RAW SEQUENCE LISTING 

PATENT APPLICATION US/09/020,716 



DATE: 02/27/98 
TIME: 18:53:29 



INPUT SET: S23914.raw 

47 

48 (ix) TELECOMMUNICATION INFORMATION: 

49 (A) TELEPHONE: 515-334-4467 

50 (B) TELEFAX: 515-334-6883 

51 (C) TELEX: 
52 

53 

54 (2) INFORMATION FOR SEQ ID NO:l: 

55 



61 

62 (ii) MOLECULE TYPE: Other 
63 

64 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
65 

66 TCGACCTCGA GGGGGGGCCC GGTACCCAGC TTTTGTTCCC TTTAGTGAGG GTTAATTGCG 60 

67 CGCTTGGCGT AATCATGGTC ATAGCTGTTT CCTGTGTGAA ATTGTTATCC GCTCACAATT 120 
^ 6 8 CCACACAACA TACGAGCCGG AAGCATAAAG TGTAAAGCCT GGGGTGCCTA ATGAGTGAGC 180 
=y 69 TAACTCACAT TAATTGCGTT GCGCTCACTG CCCGCTTTCC AGTCGGGAAA CCTGTCGTGC 240 
U 70 CAGCTGCATT AATGAATCGG CCAACGCGCG GGGAGAGGCG GTTTGCGTAT TGGGCGCTCT 300 
m 71 TCCGCTTCCT CGCTCACTGA CTCGCTGCGC TCGGTCGTTC GGCTGCGGCG AGCGGTATCA 360 
O 72 GCTCACTCAA AGGCGGTAAT ACGGTTATCC ACAGAATCAG GGGATAACGC AGGAAAGAAC 420 
%} 7 3 ATGTGAGCAA AAGGCCAGCA AAAGGCCAGG AACCGTAAAA AGGCCGCGTT GCTGGCGTTT 480 

74 TTCCATAGGC TCCGCCCCCC TGACGAGCAT CACAAAAATC GACGCTCAAG TCAGAGGTGG 540 

75 CGAAACCCGA CAGGACTATA AAGATACCAG GCGTTTCCCC CTGGAAGCTC CCTCGTGCGC 600 
0 ( 76 TCTCCTGTTC CGACCCTGCC GCTTACCGGA TACCTGTCCG CCTTTCTCCC TTCGGGAAGC 660 

77 GTGGCGCTTT CTCATAGCTC ACGCTGTAGG TATCTCAGTT CGGTGTAGGT CGTTCGCTCC 720 

Q 78 AAGCTGGGCT GTGTGCACGA ACCCCCCGTT CAGCCCGACC GCTGCGCCTT ATCCGGTAAC 780 

m 79 TATCGTCTTG AGTCCAACCC GGTAAGACAC GACTTATCGC CACTGGCAGC AGCCACTGGT 840 

^ 80 AACAGGATTA GCAGAGCGAG GTATGTAGGC GGTGCTACAG AGTTCTTGAA GTGGTGGCCT 900 

^ 81 AACTACGGCT ACACTAGAAG GACAGTATTT GGTATCTGCG CTCTGCTGAA GCCAGTTACC 960 

^ 82 TTCGGAAAAA GAGTTGGTAG CTCTTGATCC GGCAAACAAA CCACCGCTGG TAGCGGTGGT 1020 

tfi 8 3 TTTTTTGTTT GCAAGCAGCA GATTACGCGC AGAAAAAAAG GATCTCAAGA AGATCCTTTG 1080 

ffl 84 ATCTTTTCTA CGGGGTCTGA CGCTCAGTGG AACGAAAACT CACGTTAAGG GATTTTGGTC 1140 

85 ATGAGATTAT CAAAAAGGAT CTTCACCTAG ATCCTTTTAA ATTAAAAATG AAGTTTTAAA 1200 

86 TCAATCTAAA GTATATATGA GTAAACTTGG TCTGACAGTT ACCAATGCTT AATCAGTGAG 1260 

87 GCACCTATCT CAGCGATCTG TCTATTTCGT TCATCCATAG TTGCCTGACT CCCCGTCGTG 1320 

88 TAGATAACTA CGATACGGGA GGGCTTACCA TCTGGCCCCA GTGCTGCAAT GATACCGCGA 1380 

89 GACCCACGCT CACCGGCTCC AGATTTATCA GCAATAAACC AGCCAGCCGG AAGGGCCGAG 1440 

90 CGCAGAAGTG GTCCTGCAAC TTTATCCGCC TCCATCCAGT CTATTAATTG TTGCCGGGAA 150 0 

91 GCTAGAGTAA GTAGTTCGCC AGTTAATAGT TTGCGCAACG TTGTTGCCAT TGCTACAGGC 156 0 

92 ATCGTGGTGT CACGCTCGTC GTTTGGTATG GCTTCATTCA GCTCCGGTTC CCAACGATCA 162 0 

93 AGGCGAGTTA CATGATCCCC CATGTTGTGC AAAAAAGCGG TTAGCTCCTT CGGTCCTCCG 1680 

94 ATCGTTGTCA GAAGTAAGTT GGCCGCAGTG TTATCACTCA TGGTTATGGC AGCACTGCAT 1740 

95 AATTCTCTTA CTGTCATGCC ATCCGTAAGA TGCTTTTCTG TGACTGGTGA GTACTCAACC 1800 

96 AAGTCATTCT GAGAATAGTG TATGCGGCGA CCGAGTTGCT CTTGCCCGGC GTCAATACGG 1860 

97 GATAATACCG CGCCACATAG CAGAACTTTA AAAGTGCTCA TCATTGGAAA ACGTTCTTCG 1920 

98 GGGCGAAAAC TCTCAAGGAT CTTACCGCTG TTGAGATCCA GTTCGATGTA ACCCACTCGT 1980 

99 GCACCCAACT GATCTTCAGC ATCTTTTACT TTCACCAGCG TTTCTGGGTG AGCAAAAACA 204 0 
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RAW SEQUENCE LISTING 

PATENT APPLICATION US/09/020,716 



DATE: 02/27/98 
TIME: 18:53:32 



INPUT SET: S23914.raw 

100 GGAAGGCAAA ATGCCGCAAA AAAGGGAATA AGGGCGACAC GGAAATGTTG A AT AC TC AT A 2100 

101 CTCTTCCTTT TTCAATATTA TTGAAGCATT TATCAGGGTT ATTGTCTCAT GAGCGGATAC 2160 

102 ATATTTGAAT GTATTTAGAA AAATAAACAA ATAGGGGTTC CGCGCACATT TCCCCGAAAA 2220 

103 GTGCCACCTA AATTGTAAGC GTTAATATTT TGTTAAAATT CGCGTTAAAT TTTTGTTAAA 2280 

104 TCAGCTCATT TTTTAACCAA TAGGCCGAAA TCGGCAAAAT CCCTTATAAA TCAAAAGAAT 2340 

105 AGACCGAGAT AGGGTTGAGT GTTGTTCCAG TTTGGAACAA GAGTCCACTA TTAAAGAACG 2400 

106 TGGACTCCAA CGTCAAAGGG CGAAAAACCG TCTATCAGGG CGATGGCCCA CTACGTGAAC 2460 

107 CATCACCCTA ATCAAGTTTT TTGGGGTCGA GGTGCCGTAA AGCACTAAAT CGGAACCCTA 2520 

108 AAGGGAGCCC CCGATTTAGA GCTTGACGGG GAAAGCCGGC GAACGTGGCG AGAAAGGAAG 2580 

109 GGAAGAAAGC GAAAGGAGCG GGCGCTAGGG CGCTGGCAAG TGTAGCGGTC ACGCTGCGCG 2640 

110 TAACCACCAC ACCCGCCGCG CTTAATGCGC CGCTACAGGG CGCGTCCCAT TCGCCATTCA 2700 

111 GGCTGCGCAA CTGTTGGGAA GGGCGATCGG TGCGGGCCTC TTCGCTATTA CGCCAGCTGG 2760 

112 CGAAAGGGGG ATGTGCTGCA AGGCGATTAA GTTGGGTAAC GCCAGGGTTT TCCCAGTCAC 2820 

113 GACGTTGTAA AACGACGGCC AGTGAGCGCG CGTAATACGA CTCACTATAG GGCGAATTGG 2880 

114 AGCTCGACCG CGGTGGCGGC CGCTCTAGAA CTAGTGGATC CGTCGACTAG AGGGCCCGAG 2940 

115 GTCGAACTTA GGCACTAAGG GATGTGAGGC CAGCATCACC GTTGCAGAAA TTGACACAAG 3000 

116 CATCACCACA ATTTTCCAAA TAGAGTTTCA TTTCTTCGTC GTCAGCAGCT GCGTTGACCA 3060 

117 TGTAGTCACA CATGGAAGCC CTACACCCCA AGTTGCAATA CTTGACGGTG TCTGGTTCAT 3120 

118 CTGAGTTGGA CACAAGGGCC AATTTGGGGA AGCCTGTAGG GCATTTTCCG CTACTTGTGA 3180 

119 GTTTACACCT ACAGACGCCT GCGCATAACT TCTGAGCACC ACGGACGCGG CAAAGGTTGT 3240 

120 AGCAGTTTCT TCCTAGGGTG CTCCTGCAGC AACTCTTGCC TTCTACTTGC ACCTGTTCGA 3 300 

121 GAACCAACCC CAGTATAAGT AAACACACCA TCACACCCTT GAGGCCCTTG CTGGTGGCCA 3360 

122 TGG 3363 
123 

124 (2) INFORMATION FOR SEQ ID NO: 2: 

125 

126 (i) SEQUENCE CHARACTERISTICS: 

127 (A) LENGTH: 3365 base pairs 

128 (B) TYPE: nucleic acid 

129 (C) STRANDEDNESS : single 

130 (D) TOPOLOGY: linear 
131 

132 (ii) MOLECULE TYPE: Other 

133 

134 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

135 

136 TCGACCTCGA GGGGGGGCCC GGTACCCAGC TTTTGTTCCC TTTAGTGAGG GTTAATTGCG 60 

137 CGCTTGGCGT AATCATGGTC ATAGCTGTTT CCTGTGTGAA ATTGTTATCC GCTCACAATT 120 

138 CCACACAACA TACGAGCCGG A AG CAT A A AG TGTAAAGCCT GGGGTGCCTA ATGAGTGAGC 180 

139 TAACTCACAT TAATTGCGTT GCGCTCACTG CCCGCTTTCC AGTCGGGAAA CCTGTCGTGC 240 

140 CAGCTGCATT AATGAATCGG CCAACGCGCG GGGAGAGGCG GTTTGCGTAT TGGGCGCTCT 300 

141 TCCGCTTCCT CGCTCACTGA CTCGCTGCGC TCGGTCGTTC GGCTGCGGCG AGCGGTATCA 360 

142 GCTCACTCAA AGGCGGTAAT ACGGTTATCC ACAGAATCAG GGGATAACGC AGGAAAGAAC 420 

143 ATGTGAGCAA AAGGCCAGCA AAAGGCCAGG AACCGTAAAA AGGCCGCGTT GCTGGCGTTT 480 

144 TTCCATAGGC TCCGCCCCCC TGACGAGCAT CACAAAAATC GACGCTCAAG TCAGAGGTGG 540 

145 CGAAACCCGA CAGGACTATA AAGATACCAG GCGTTTCCCC CTGGAAGCTC CCTCGTGCGC 600 

146 TCTCCTGTTC CGACCCTGCC GCTTACCGGA TACCTGTCCG CCTTTCTCCC TTCGGGAAGC 660 

147 GTGGCGCTTT CTCATAGCTC ACGCTGTAGG TATCTCAGTT CGGTGTAGGT CGTTCGCTCC 720 

148 AAGCTGGGCT GTGTGCACGA ACCCCCCGTT CAGCCCGACC GCTGCGCCTT ATCCGGTAAC 780 

14 9 TATCGTCTTG AGTCCAACCC GGTAAGACAC GACTTATCGC CACTGGCAGC AGCCACTGGT 840 

15 0 AACAGGATTA GCAGAGCGAG GTATGTAGGC GGTGCTACAG AGTTCTTGAA GTGGTGGCCT 900 

151 AACTACGGCT AC AC TAG A AG GACAGTATTT GGTATCTGCG CTCTGCTGAA GCCAGTTACC 960 

152 TTCGGAAAAA GAGTTGGTAG CTCTTGATCC GGCAAACAAA CCACCGCTGG TAGCGGTGGT 1020 
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RAW SEQUENCE LISTING 

PATENT APPLICATION US/09/020,716 



DATE: 02/27/98 
TIME: 18:53:35 



INPUT SET: S23914.mw 



153 TTTTTTGTTT GCAAGCAGCA GATTACGCGC AGAAAAAAAG GATCTCAAGA AGATCCTTTG 

154 ATCTTTTCTA CGGGGTCTGA CGCTCAGTGG AACGAAAACT CACGTTAAGG GATTTTGGTC 

155 ATGAGATTAT CAAAAAGGAT CTTCACCTAG ATCCTTTTAA ATTAAAAATG AAGTTTTAAA 

156 TCAATCTAAA GTATATATGA GTAAACTTGG TCTGACAGTT ACCAATGCTT AATCAGTGAG 

157 GCACCTATCT CAGCGATCTG TCTATTTCGT TCATCCATAG TTGCCTGACT CCCCGTCGTG 

158 TAGATAACTA CGATACGGGA GGGCTTACCA TCTGGCCCCA GTGCTGCAAT GATACCGCGA 

159 GACCCACGCT CACCGGCTCC AGATTTATCA GCAATAAACC AGCCAGCCGG AAGGGCCGAG 

160 CGCAGAAGTG GTCCTGCAAC TTTATCCGCC TCCATCCAGT CTATTAATTG TTGCCGGGAA 

161 GCTAGAGTAA GTAGTTCGCC AGTTAATAGT TTGCGCAACG TTGTTGCCAT TGCTACAGGC 

162 ATCGTGGTGT CACGCTCGTC GTTTGGTATG GCTTCATTCA GCTCCGGTTC CCAACGATCA 
16 3 AGGCGAGTTA CATGATCCCC CATGTTGTGC AAAAAAGCGG TTAGCTCCTT CGGTCCTCCG 

164 ATCGTTGTCA GAAGTAAGTT GGCCGCAGTG TTATCACTCA TGGTTATGGC AGCACTGCAT 

165 AATTCTCTTA CTGTCATGCC ATCCGTAAGA TGCTTTTCTG TGACTGGTGA GTACTCAACC 

166 AAGTCATTCT GAGAATAGTG TATGCGGCGA CCGAGTTGCT CTTGCCCGGC GTCAATACGG 

167 GATAATACCG CGCCACATAG CAGAACTTTA AAAGTGCTCA TCATTGGAAA ACGTTCTTCG 

168 GGGCGAAAAC TCTCAAGGAT CTTACCGCTG TTGAGATCCA GTTCGATGTA ACCCACTCGT 

169 GCACCCAACT GATCTTCAGC ATCTTTTACT TTCACCAGCG TTTCTGGGTG AGCAAAAACA 

170 GGAAGGCAAA ATGCCGCAAA AAAGGGAATA AGGGCGACAC GGAAATGTTG AATACTCATA 

171 CTCTTCCTTT TTCAATATTA TTGAAGCATT TATCAGGGTT ATTGTCTCAT GAGCGGATAC 

172 ATATTTGAAT GTATTTAGAA AAATAAACAA ATAGGGGTTC CGCGCACATT TCCCCGAAAA 

173 GTGCCACCTA AATTGTAAGG GTTAATATTT TGTTAAAATT CGCGTTAAAT TTTTGTTAAA 

174 TCAGCTCATT TTTTAACCAA TAGGCCGAAA TCGGCAAAAT CCCTTATAAA TCAAAAGAAT 

175 AGACCGAGAT AGGGTTGAGT GTTGTTCCAG TTTGGAACAA GAGTCCACTA TTAAAGAACG 

176 TGGACTCCAA CGTCAAAGGG CGAAAAACCG TCTATCAGGG CGATGGCCCA CTACGTGAAC 

177 CATCACCCTA ATCAAGTTTT TTGGGGTCGA GGTGCCGTAA AGCACTAAAT CGGAACCCTA 

178 AAGGGAGCCC CCGATTTAGA GCTTGACGGG GAAAGCCGGC GAACGTGGCG AGAAAGGAAG 

179 GGAAGAAAGC GAAAGGAGCG GGCGCTAGGG CGCTGGCAAG TGTAGCGGTG ACGCTGCGCG 

180 TAACCACCAC ACCCGCCGCG CTTAATGCGC CGCTACAGGG CGCGTCCCAT TCGCCATTCA 

181 GGCTGCGCAA CTGTTGGGAA GGGCGATCGG TGCGGGCCTC TTCGCTATTA CGCCAGCTGG 

182 CGAAAGGGGG ATGTGCTGCA AGGCGATTAA GTTGGGTAAC GCCAGGGTTT TCCCAGTCAC 

183 GACGTTGTAA AACGACGGCC AGTGAGCGCG CGTAATACGA CTCACTATAG GGCGAATTGG 

184 AGCTCCACCG CGGTGGCGGC CGCTCTAGAA CTAGTGGATC CGTCGACTAG AGGGCCCGAC 

185 GTCGAACTTA GGCACTAAGG GATGTGAGGC CAGCATCACC GTTGCAGAAA TTGACACAAG 

186 CATCACCACA ATTTTCCAAA TAGAGTTTCA TTTCTTCGTC GTCAGCAGCT GCGTTGACCA 

187 TGTAGTCACA CATGGAAGCC CTACACCCCA AGTTGCAATA CTTGACGGTG TCTGGTTCAT 

188 CTGAGTTGGA CACAAGGGCC AATTTGGGGA AGCGTTTCGG GCATTTTCCG CTACTAGTCA 

189 GCTTACACTT GCAGACGCCT GCGCAAAGCT TCTTGGCGCC TTTGACTTTG CAAAGGTTGT 

190 AGCACTTCCT TCCCAGGGTA CTCTTGCAGC AACTCTTGCC TTCTACTTGC ACCTGTTCGA 

191 GAACCAACCC CAGTATAAGT A A AC AC AC C A TCACACCCTT GAGGCCCTTG CTGGTGGCCA 

192 TGGTG 
193 

194 (2) INFORMATION FOR SEQ ID NO: 3: 



1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3365 



195 
196 
197 
198 
199 
200 
201 
202 
203 
204 
205 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5360 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Other 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
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PATENT APPLICATION US/09/020,716 TIME: 18:53:39 



INPUT SET: S23914.raw 





206 


CTAAATTGTA 


AGCGTTAATA 


TTTTGTTAAA 


ATTCGCGTTA 


AATTTTTGTT 


AAATCAGCTC 


60 




207 


ATTTTTTAAC 


CAATAGGCCG 


AAATCGGCAA 


AATCCCTTAT 


AAATCAAAAG 


AATAGACCGA 


120 




208 


GATAGGGTTG 


AGTGTTGTTG 


CAGTTTGGAA 


CAAGAGTCCA 


CTATTAAAGA 


ACGTGGACTC 


180 




209 


CAACGTCAAA 


GGGCGAAAAA 


CCGTCTATCA 


GGGCGATGGC 


CCACTACGTG 


AACCATCACC 


240 




210 


CTAATCAAGT 


TTTTTGGGGT 


CGAGGTGCGG 


TAAAGCACTA 


AATCGGAACC 


CTAAAGGGAG 


300 




211 


CCCCCGATTT 


AGAGCTTGAC 


GGGGAAAGCC 


GGCGAACGTG 


GCGAGAAAGG 


AAGGGAAGAA 


360 




212 


AGCGAAAGGA 


GCGGGCGGTA 


GGGCGCTGGC 


AAGTGTAGCG 


GTCACGCTGC 


GCGTAACCAC 


420 




213 


CACACCCGCC 


GCGCTTAATG 


CGCCGCTACA 


GGGCGCGTCC 


CATTCGCCAT 


TCAGGCTGCG 


480 




214 


CAACTGTTGG 


GAAGGGCGAT 


CGGTGCGGGC 


CTCTTCGCTA 


TTACGCCAGC 


TGGCGAAAGG 


540 




215 


GGGATGTGCT 


GCAAGGCGAT 


TAAGTTGGGT 


AACGCCAGGG 


TTTTCCCAGT 


CACGACGTTG 


600 




216 


TAAAACGACG 


GCCAGTGAGC 


GCGCGTAATA 


CGACTCACTA 


TAGGGCGAAT 


TGGAGCTCCA 


660 




217 


CCGCGGTGGC 


GGCCGCTCTA 


GATTATATAA 


TTTATAAGCT 


AAACAACCCG 


GCCCTAAAGC 


720 




218 


ACTATCGTAT 


CACCTATCTA 


AATAAGTCAC 


GGGAGTTTCG 


AACGTCCACT 


TCGTCGCACG 


780 




219 


GAATTGCATG 


TTTCTTGTTG 


GAAGCATATT 


CACGCAATCT 


C C AC AC AT AA 


AGGTTTATGT 


840 




220 


ATAAACTTAC 


ATTTAGCTCA 


GTTTAATTAC 


AGTCTTATTT 


GGATGCATAT 


GTATGGTTCT 


900 




221 


CAATCCATAT 


AAGTTAGAGT 


AAAAAATAAG 


TTTAAATTTT 


ATCTTAATTC 


ACTCCAACAT 


960 




222 


ATATGGATCT 


ACAATACTCA 


TGTGCATCCA 


AACAAACTAC 


TTATATTGAG 


GTGAATTTGG 


1020 




223 


TAGAAATTAA 


ACTAACTTAC 


ACACTAAGCC 


AATCTTTACT 


ATATTAAAGC 


ACCAGTTTCA 


1080 




224 


ACGATCGTCC 


CGCGTCAATA 


TTATTAAAAA 


ACTCCTACAT 


TTCTTTATAA 


TCAACCCGCA 


1140 




225 


CTCTTATAAT 


CTCTTCTCTA 


CTACTATAAT 


AAGAGAGTTT 


ATGTACAAAA 


TAAGGTGAAA 


1200 


JWSSi 


226 


TTATCTATAA 


GTGTTCTGGA 


TATTGGTTGT 


TGGCTCCCAT 


ATTCACACAA 


CCTAATCAAT 


1260 




227 


AGAAAACATA 


TGTTTTATTA 


AAACAAAATT 


TATCATATAT 


C AT AT ATAT A 


TATATATCAT 


1320 




228 


ATATATATAT 


AAACCGTAGC 


AATGCACGGG 


CATATAACTA 


GTGCAACTTA 


ATACATGTGT 


1380 




229 


GTATTAAGAT 


GAATAAGAGG 


GTATCCAAAT 


AAAAAACTTG 


TTGCTTACGT 


ATGGATCGAA 


1440 


nl 


230 


AGGGGTTGGA 


AACGATTAAA 


CGATTAAATC 


TCTTCCTAGT 


CAAAATTGAA 


TAGAAGGAGA 


1500 


B. 


231 


TTTAATATAT 


CCCAATCCCC 


TTCGATCATC 


CAGGTGCAAC 


CGTATAAGTC 


CTAAAGTGGT 


1560 




232 


GAGGAACACG 


AAAGAACCAT 


GCATTGGCAT 


GTAAAGCTGC 


AAGAATTTGT 


TGTATCCTTA 


1620 




233 


ACAACTCACA 


GAACATCAAC 


CAAAATTGCA 


CGTCAAGGGT 


ATTGGGTAAG 


AAACAATCAA 


1680 


234 


ACAAATCCTC 


TCTGTGTGCA 


AAGAAACACG 


GTGAGTCATG 


CCGAGATCAT 


ACTCATCTGA 


1740 


y i 


235 


TATACATGCT 


TACAGGTCAC 


AAGACATTAC 


AAACAACTCA 


TATTGCATTA 


CAAAGATCGT 


1800 


is 


236 


TTCATGAAAA 


ATAAAATAGG 


CCGGACAGGA 


CAAAAATCCT 


TGACGTGTAA 


AGTAAATTTA 


1860 




237 


CAACAAAAAA 


AAAGCCATAT 


GTCAAGCTAA 


ATCTAATTCG 


TTTTACGTAG 


ATCAACAACC 


1920 


IT! 


238 


TGTAGAAGGC 


AACAAAACTG 


AGCCACGCAG 


AAGTACAGAA 


TGATTCCAGA 


TGAACCATCG 


1980 




239 


ACGTGCTACG 


TAAAGAGAGT 


GACGAGTCAT 


ATACATTTGG 


CAAGAAACCA 


TGAAGCTGCC 


2040 




240 


TACAGCCGTC 


TCGGTGGCAT 


AAGAACACAA 


GAAATTGTGT 


TAATTAATCA 


AAGCTATAAA 


2100 




241 


TAACGCTCGC 


ATGCCTGTGC 


ACTTCTCCAT 


CACCACCACT 


GGGTCTTCAG 


ACCATTAGCT 


2160 




242 


TTATCTACTC 


CAGAGCGCAG 


AAGAACCCGA 


TCGACACCAT 


GGCCACCAGC 


AAGGGCCTCA 


2220 




243 


AGGGTGTGAT 


GGTGTGTTTA 


CTTATACTGG 


GGTTGGTTCT 


CGAACAGGTG 


CAAGTAGAAG 


2280 




244 


GCAAGAGTTG 


CTGCAAGAGT 


ACCCTGGGAA 


GGAAGTGCTA 


CAACCTTTGC 


AAAGTCAAAG 


2340 




245 


GCGCCAAGAA 


GCTTTGCGCA 


GGCGTCTGCA 


AGTGTAAGCT 


GACTAGTAGC 


GGAAAATGCC 


2400 




246 


CGAAAGGCTT 


CCCCAAATTG 


GCCCTTGTGT 


CCAACTCAGA 


TGAACCAGAC 


ACCGTCAAGT 


2460 




247 


ATTGCAACTT 


GGGGTGTAGG 


GCTTCCATGT 


GTGACTACAT 


GGTCAACGCA 


GCTGCTGACG 


2520 




248 


ACGAAGAAAT 


GAAACTCTAT 


TTGGAAAATT 


GTGGTGATGC 


TTGTGTCAAT 


TTCTGCAACG 


2580 




249 


GTGATGCTGG 


CCTCACATCC 


CTTAGTGCCT 


AAGTTCGACG 


TCGGGCCCTC 


TAGTCGACGG 


2640 




250 


ATCCCCGGCG 


GTGTCCCCCA 


CTGAAGAAAC 


TATGTGCTGT 


AGTATAGCCG 


CTGCCCGCTG 


2700 




251 


GCTAGCTAGC 


TAGTTGAGTC 


ATTTAGCGGC 


GATGATTGAG 


TAATAATGTG 


TCACGCATCA 


2760 




252 


CCATGCATGG 


GTGGCAGTGT 


CAGTGTGAGC 


AATGACCTGA 


ATGAACAATT 


GAAATGAAAA 


2820 




253 


GAAAAAAGTA 


TTGTTCCAAA 


TTAAACGTTT 


TAACCTTTTA 


ATAGGTTTAT 


ACAATAATTG 


2880 




254 


ATATATGTTT 


TCTGTATATG 


TCTAATTTGT 


TATCATCCAT 


TTAGATATAG 


ACAAAAAAAA 


2940 




255 


ATCTAAGAAC 


TAAAACAAAT 


GCTAATTTGA 


AATGAAGGGA 


GTATATATTG 


GGATAATGTC 


3000 




256 


GATGAGATCC 


CTCGTAATAT 


CACCGACATC 


ACACGTGTCC 


AGTTAATGTA 


TCAGTGATAC 


3060 




257 


GTGTATTCAC 


ATTTGTTGCG 


CGTAGGCGTA 


CCCAACAATT 


TTGATCGACT 


ATC AG A A AG T 


3120 




258 


CAACGGAAGC 


GAGTCGACCT 


CGAGGGGGGG 


CCCGGTACCC 


AGCTTTTGTT 


CCCTTTAGTG 


3180 



